This paper presents multi- and interdisciplinary approaches for finding the appropriate AI technologies for research information. Professional research information management (RIM) is becoming increasingly important as an expressly data-driven tool for researchers. It is not only the basis of scientific knowledge processes, but also related to other data. A concept and a process model of the elementary phases from the start of the project to the ongoing operation of the AI methods in the RIM is presented, portraying the implementation of an AI project, meant to enable universities and research institutions to support their researchers in dealing with incorrect and incomplete research information, while it is being stored in their RIMs. Our aim is to show how research information harmonizes with the challenges of data literacy and data quality issues, related to AI, also wanting to underline that any project can be successful if the research institutions and various departments of universities, involved work together and appropriate support is offered to improve research information and data management.
Demographic data collection is essential in education research, as demographic data allows researchers to better describe the participant population they study and to contextualize findings. However, current research practices for neurodiversity demographics often rely on prescriptive methods (e.g., requiring participants to report official diagnoses) rather than allowing participants to self-identify. This approach can: a) not allow participants to express their intersecting identities in ways that are authentic; and b) limit trustworthiness and reliability of the data and interpretation. In addition, inconsistent dissemination and representation of demographic data across studies hinder the accessibility and usability of this work. Through a literature review of neurodivergent student experiences with learning and performing STEM, we identified widespread discrepancies in how demographic information is collected and reported. This paper explores how neurodivergent identities can be more accurately and inclusively represented in education research. We present findings of a thematic analysis on the ways neurodivergent demographic data collection is done in the literature using data
In clinical research and clinical decision-making, it is important to know if a study changes or only supports the current standards of care for specific disease management. We define such a change as transformative and a support as incremental research. It usually requires a huge amount of domain expertise and time for humans to finish such tasks. Faculty Opinions provides us with a well-annotated corpus on whether a research challenges or only confirms established research. In this study, a machine learning approach is proposed to distinguishing transformative from incremental clinical evidence. The texts from both abstract and a 2-year window of citing sentences are collected for a training set of clinical studies recommended and labeled by Faculty Opinions experts. We achieve the best performance with an average AUC of 0.755 (0.705-0.875) using Random Forest as the classifier and citing sentences as the feature. The results showed that transformative research has typical language patterns in citing sentences unlike abstract sentences. We provide an efficient tool for identifying those clinical evidence challenging or only confirming established claims for clinicians and researc
In the rapidly evolving landscape of AI research and application, Multimodal Large Language Models (MLLMs) have emerged as a transformative force, adept at interpreting and integrating information from diverse modalities such as text, images, and Graphical User Interfaces (GUIs). Despite these advancements, the nuanced interaction and understanding of GUIs pose a significant challenge, limiting the potential of existing models to enhance automation levels. To bridge this gap, this paper presents V-Zen, an innovative Multimodal Large Language Model (MLLM) meticulously crafted to revolutionise the domain of GUI understanding and grounding. Equipped with dual-resolution image encoders, V-Zen establishes new benchmarks in efficient grounding and next-action prediction, thereby laying the groundwork for self-operating computer systems. Complementing V-Zen is the GUIDE dataset, an extensive collection of real-world GUI elements and task-based sequences, serving as a catalyst for specialised fine-tuning. The successful integration of V-Zen and GUIDE marks the dawn of a new era in multimodal AI research, opening the door to intelligent, autonomous computing experiences. This paper extends
This study employs scientometric methods to assess the research output and performance of the University of Ibadan from 2014 to 2023. By analyzing publication trends, citation patterns, and collaboration networks, the research aims to comprehensively evaluate the university's research productivity, impact, and disciplinary focus. This article's endeavors are characterized by innovation, interdisciplinary collaboration, and commitment to excellence, making the University of Ibadan a significant hub for cutting-edge research in Nigeria and beyond. The goal of the current study is to ascertain the influence of the university's research output and publication patterns between 2014 and 2023. The study focuses on the departments at the University of Ibadan that contribute the most, the best journals for publishing, the nations that collaborate, the impact of citations both locally and globally, well-known authors and their total production, and the research output broken down by year. According to the university's ten-year publication data, 7159 papers with an h-index of 75 were published between 2014 and 2023, garnering 218572 citations. Furthermore, the VOSviewer software mapping appro
This paper presents a scientometric analysis of research output from the University of Lagos, focusing on the two decades spanning 2004 to 2023. Using bibliometric data retrieved from the Web of Science, we examine trends in publication volume, collaboration patterns, citation impact, and the most prolific authors, departments, and research domains at the university. The study reveals a consistent increase in research productivity, with the highest publication output recorded in 2023. Health Sciences, Engineering, and Social Sciences are identified as dominant fields, reflecting the university's interdisciplinary research strengths. Collaborative efforts, both locally and internationally, show a positive correlation with higher citation impact, with the United States and the United Kingdom being the leading international collaborators. Notably, open-access publications account for a significant portion of the university's research output, enhancing visibility and citation rates. The findings offer valuable insights into the university's research performance over the past two decades, providing a foundation for strategic planning and policy formulation to foster research excellence
Objectives: Electronic health records (EHRs) are only a first step in capturing and utilizing health-related data - the challenge is turning that data into useful information. Furthermore, EHRs are increasingly likely to include data relating to patient outcomes, functionality such as clinical decision support, and genetic information as well, and, as such, can be seen as repositories of increasingly valuable information about patients' health conditions and responses to treatment over time. Methods: We describe a case study of 423 patients treated by Centerstone within Tennessee and Indiana in which we utilized electronic health record data to generate predictive algorithms of individual patient treatment response. Multiple models were constructed using predictor variables derived from clinical, financial and geographic data. Results: For the 423 patients, 101 deteriorated, 223 improved and in 99 there was no change in clinical condition. Based on modeling of various clinical indicators at baseline, the highest accuracy in predicting individual patient response ranged from 70-72% within the models tested. In terms of individual predictors, the Centerstone Assessment of Recovery Le
Consent is an ethical cornerstone of clinical research and healthcare in general. Although the ethical principles of consent - providing information, ensuring comprehension, and ensuring voluntariness - are well-defined, the technological infrastructure remains outdated. Clinicians are responsible for obtaining informed consent from research subjects or patients, and for managing it before, during, and after clinical trials or care, which is a burden for them. The voluntary nature of participating in clinical research or undergoing medical treatment implies the need for a participant-centric consent management system. However, this is not reflected in most established systems. Not only do most healthcare information systems not follow a user-centric model, but they also create data silos, which significantly reduce the mobility of patient data between different healthcare institutions and impact personalized medicine. Furthermore, consent management tools are outdated. We propose ClinConNet (Clinical Consent Network), a platform that connects researchers and participants based on clinical research projects. ClinConNet is powered by a dynamic consent model based on blockchain and ta
Developing artificial intelligence (AI) for clinical research requires a comprehensive data foundation that supports model training and rigorous evaluation. Here, we introduce TrialPanorama, a large-scale structured resource that aggregates 1.6M clinical trial records from fifteen global registries and links them with biomedical ontologies and associated literature. To demonstrate its utility, we build a pipeline that constructs 152K training and testing samples for eight key clinical research tasks. Three tasks support systematic review workflows, including study search, study screening, and evidence summarization. Five tasks focus on trial design and optimization, including arm design, eligibility criteria design, endpoint selection, sample size estimation, and trial completion assessment and rationalization. Benchmarking cutting-edge large language models (LLMs) reveals that generic LLMs have limited capability in clinical reasoning. In contrast, an 8B LLM we developed on TrialPanorama using supervised finetuning and reinforcement learning wins over the 70B generic counterparts in all eight tasks, with a relative improvement of 73.7%, 67.6%, 38.4%, 37.8%, 26.5%, 20.7%, 20.0%, 18
Background and Aims: This study evaluates the medical reasoning performance of large language models (LLMs) and vision language models (VLMs) in gastroenterology. Methods: We used 300 gastroenterology board exam-style multiple-choice questions, 138 of which contain images to systematically assess the impact of model configurations and parameters and prompt engineering strategies utilizing GPT-3.5. Next, we assessed the performance of proprietary and open-source LLMs (versions), including GPT (3.5, 4, 4o, 4omini), Claude (3, 3.5), Gemini (1.0), Mistral, Llama (2, 3, 3.1), Mixtral, and Phi (3), across different interfaces (web and API), computing environments (cloud and local), and model precisions (with and without quantization). Finally, we assessed accuracy using a semiautomated pipeline. Results: Among the proprietary models, GPT-4o (73.7%) and Claude3.5-Sonnet (74.0%) achieved the highest accuracy, outperforming the top open-source models: Llama3.1-405b (64%), Llama3.1-70b (58.3%), and Mixtral-8x7b (54.3%). Among the quantized open-source models, the 6-bit quantized Phi3-14b (48.7%) performed best. The scores of the quantized models were comparable to those of the full-precision
The Vehicle Routing Problem (VRP) is one of the most intensively studied combinatorial optimisation problems for which numerous models and algorithms have been proposed. To tackle the complexities, uncertainties and dynamics involved in real-world VRP applications, Machine Learning (ML) methods have been used in combination with analytical approaches to enhance problem formulations and algorithmic performance across different problem solving scenarios. However, the relevant papers are scattered in several traditional research fields with very different, sometimes confusing, terminologies. This paper presents a first, comprehensive review of hybrid methods that combine analytical techniques with ML tools in addressing VRP problems. Specifically, we review the emerging research streams on ML-assisted VRP modelling and ML-assisted VRP optimisation. We conclude that ML can be beneficial in enhancing VRP modelling, and improving the performance of algorithms for both online and offline VRP optimisations. Finally, challenges and future opportunities of VRP research are discussed.
This scientometric study analyzes Avian Influenza research from 2014 to 2023 using bibliographic data from the Web of Science database. We examined publication trends, sources, authorship, collaborative networks, document types, and geographical distribution to gain insights into the global research landscape. Results reveal a steady increase in publications, with high contributions from Chinese and American institutions. Journals such as PLoS One and the Journal of Virology published the highest number of studies, indicating their influence in this field. The most prolific institutions include the Chinese Academy of Sciences and the University of Hong Kong, while the College of Veterinary Medicine at South China Agricultural University emerged as the most productive department. China and the USA lead in publication volume, though developed nations like the United Kingdom and Germany exhibit a higher rate of international collaboration. "Articles" are the most common document type, constituting 84.6% of the total, while "Reviews" account for 7.6%. This study provides a comprehensive view of global trends in Avian Influenza research, emphasizing the need for collaborative efforts ac
Artificial intelligence (AI) research is routinely criticized for its real and potential impacts on society, and we lack adequate institutional responses to this criticism and to the responsibility that it reflects. AI research often falls outside the purview of existing feedback mechanisms such as the Institutional Review Board (IRB), which are designed to evaluate harms to human subjects rather than harms to human society. In response, we have developed the Ethics and Society Review board (ESR), a feedback panel that works with researchers to mitigate negative ethical and societal aspects of AI research. The ESR's main insight is to serve as a requirement for funding: researchers cannot receive grant funding from a major AI funding program at our university until the researchers complete the ESR process for the proposal. In this article, we describe the ESR as we have designed and run it over its first year across 41 proposals. We analyze aggregate ESR feedback on these proposals, finding that the panel most commonly identifies issues of harms to minority groups, inclusion of diverse stakeholders in the research plan, dual use, and representation in data. Surveys and interviews o
This study employs scientometric methods to assess the research output and performance of the University of Nigeria from 2014 to 2023. By analyzing publication trends, citation patterns, and collaboration networks, the research aims to comprehensively evaluate the university's research productivity, impact, and disciplinary focus. These research endeavors are characterized by innovation, interdisciplinary collaboration, and commitment to excellence, making the University of Nigeria a significant hub for cutting-edge research in Nigeria and beyond. The present study has been undertaken to determine the impact of the university's research and publication trends from 2014 to 2023. The study focuses on year-wise research output, citation impact at local and global levels, prominent authors and their total output, top journals, collaborating countries, and the most contributing departments of the University of Nigeria. The university's ten years of publication data indicate that 6,353 papers were published from 2014 to 2023, receiving 86,202 citations with an h-index of 39. In addition to this, the stenographical mapping of data is presented through graphs using the VOSviewer software m
We present MH-1M, one of the most comprehensive and up-to-date datasets for advanced Android malware research. The dataset comprises 1,340,515 applications, encompassing a wide range of features and extensive metadata. To ensure accurate malware classification, we employ the VirusTotal API, integrating multiple detection engines for comprehensive and reliable assessment. Our GitHub, Figshare, and Harvard Dataverse repositories provide open access to the processed dataset and its extensive supplementary metadata, totaling more than 400 GB of data and including the outputs of the feature extraction pipeline as well as the corresponding VirusTotal reports. Our findings underscore the MH-1M dataset's invaluable role in understanding the evolving landscape of malware.
Modern research heavily relies on software. A significant challenge researchers face is understanding the complex software used in specific research fields. We target two scenarios in this context, namely long onboarding times for newcomers and conference reviewers evaluating replication packages. We hypothesize that both scenarios can be significantly improved when there is a clear link between the paper's ideas and the code that implements them. As a time- and staff-saving approach, we propose an LLM-based automation tool that takes in a paper and the software implementing the paper, and generates a trace mapping between research ideas and their locations in code. Initial experiments have shown that the tool can generate quite useful mappings.
The recent advent of connected and automated vehicles (CAVs) is expected to transform the transportation system. CAV technologies are being developed rapidly and they are foreseen to penetrate the market at a rapid pace. On the other hand, work zones (WZs) have become common areas on highway systems as a result of the increasing construction and maintenance activities. The near future will therefore bring the coexistence of CAVs and WZs which makes their interaction inevitable. WZs expose all vehicles to a sudden and complex geometric change in the roadway environment, something that may challenge many of CAV navigation capabilities. WZs however also impose a space contraction resulting in adverse traffic impacts, something that legitimately calls for benefiting from the highly efficient CAV functions. CAVs should be able to reliably traverse WZ geometry and WZs should benefit from CAV intelligent functions. This paper reviews the state-of-the-art and the key concepts, opportunities, and challenges of deploying CAV systems at WZs. The reviewed subjects include traffic performance and behaviour, technologies and infrastructure, and regulatory considerations. Eighteen CAV mobility, s
Interest in Artificial Intelligence (AI) and its applications has seen unprecedented growth in the last few years. This success can be partly attributed to the advancements made in the sub-fields of AI such as machine learning, computer vision, and natural language processing. Much of the growth in these fields has been made possible with deep learning, a sub-area of machine learning that uses artificial neural networks. This has created significant interest in the integration of vision and language. In this survey, we focus on ten prominent tasks that integrate language and vision by discussing their problem formulation, methods, existing datasets, evaluation measures, and compare the results obtained with corresponding state-of-the-art methods. Our efforts go beyond earlier surveys which are either task-specific or concentrate only on one type of visual content, i.e., image or video. Furthermore, we also provide some potential future directions in this field of research with an anticipation that this survey stimulates innovative thoughts and ideas to address the existing challenges and build new applications.
Software is at the core of most scientific discoveries today. Therefore, the quality of research results highly depends on the quality of the research software. Rigorous testing, as we know it from software engineering in the industry, could ensure the quality of the research software but it also requires a substantial effort that is often not rewarded in academia. Therefore, this research explores the effects of research software testing integrated into teaching on research software. In an in-vivo experiment, we integrated the engineering of a test suite for a large-scale network simulation as group projects into a course on software testing at the Blekinge Institute of Technology, Sweden, and qualitatively measured the effects of this integration on the research software. We found that the research software benefited from the integration through substantially improved documentation and fewer hardware and software dependencies. However, this integration was effortful and although the student teams developed elegant and thoughtful test suites, no code by students went directly into the research software since we were not able to make the integration back into the research software
Case-oriented physics education research - which seeks to refine and develop theory by linking that theory to cases - incorporates distinct practices for selecting data for analysis, generalizing results, and making causal claims. Unanswered questions about these practices may constrain researchers more familiar with the recurrence-oriented research paradigm - which seeks to inform instructional predictions by discerning reproducible, representative patterns and relationships - from participating in or critically engaging with case-oriented research. We use results from interviews with physics education researchers, a synthesis of the literature on research methodologies, and published examples of case-oriented and recurrence-oriented research to answer "hard-hitting questions" that researchers may pose. In doing so, we aim to substantiate our position that both case-oriented and recurrence- oriented PER are rigorous but that the rigor is of a different nature in each paradigm.