Clinical data-driven research requires clinical expertise, programming skills, access to patient data, and extensive documentation, creating barriers and slowing the pace for clinicians and external researchers. To address this, we developed the Clinical Agentic Research Intelligence System (CARIS) that automates the workflow: research planning, literature search, cohort construction, Institutional Review Board (IRB) documentation, Vibe Machine Learning (ML), and report generation, with human-in-the-loop refinement. CARIS integrates Large Language Models (LLMs) with modular tools through the Model Context Protocol (MCP), enabling natural language-driven research without coding while allowing users to access only outputs. We evaluated CARIS on three heterogeneous datasets with distinct clinical tasks, where it completed planning and IRB documentation within four iterations, supported Vibe ML, and generated reports, achieving 96% completeness in LLM-based evaluation and 82% in human evaluation. CARIS demonstrates potential to reduce documentation burden and technical barriers, accelerating data-driven clinical research across public and private data environments.
Integrating brain imaging data with clinical reports offers a valuable opportunity to leverage complementary multimodal information for more effective and timely diagnosis in practical clinical settings. This approach has gained significant attention in brain disorder research, yet a key challenge remains: how to effectively link objective imaging data with subjective text-based reports, such as doctors' notes. In this work, we propose a novel framework that aligns brain connectomes with clinical reports in a shared cross-modal latent space at both the subject and connectome levels, thereby enhancing representation learning. The key innovation of our approach is that we treat brain subnetworks as tokens of imaging data, rather than raw image patches, to align with word tokens in clinical reports. This enables a more efficient identification of system-level associations between neuroimaging findings and clinical observations, which is critical since brain disorders often manifest as network-level abnormalities rather than isolated regional alterations. We applied our method to mild cognitive impairment (MCI) using the ADNI dataset. Our approach not only achieves state-of-the-art pre
Demographic data collection is essential in education research, as demographic data allows researchers to better describe the participant population they study and to contextualize findings. However, current research practices for neurodiversity demographics often rely on prescriptive methods (e.g., requiring participants to report official diagnoses) rather than allowing participants to self-identify. This approach can: a) not allow participants to express their intersecting identities in ways that are authentic; and b) limit trustworthiness and reliability of the data and interpretation. In addition, inconsistent dissemination and representation of demographic data across studies hinder the accessibility and usability of this work. Through a literature review of neurodivergent student experiences with learning and performing STEM, we identified widespread discrepancies in how demographic information is collected and reported. This paper explores how neurodivergent identities can be more accurately and inclusively represented in education research. We present findings of a thematic analysis on the ways neurodivergent demographic data collection is done in the literature using data
This paper presents a scientometric analysis of research output from the University of Lagos, focusing on the two decades spanning 2004 to 2023. Using bibliometric data retrieved from the Web of Science, we examine trends in publication volume, collaboration patterns, citation impact, and the most prolific authors, departments, and research domains at the university. The study reveals a consistent increase in research productivity, with the highest publication output recorded in 2023. Health Sciences, Engineering, and Social Sciences are identified as dominant fields, reflecting the university's interdisciplinary research strengths. Collaborative efforts, both locally and internationally, show a positive correlation with higher citation impact, with the United States and the United Kingdom being the leading international collaborators. Notably, open-access publications account for a significant portion of the university's research output, enhancing visibility and citation rates. The findings offer valuable insights into the university's research performance over the past two decades, providing a foundation for strategic planning and policy formulation to foster research excellence
The competency of any intelligent agent is bounded by its formal account of the world in which it operates. Clinical AI lacks such an account. Existing frameworks address evaluation, regulation, or system design in isolation, without a shared model of the clinical world to connect them. We introduce the Clinical World Model, a framework that formalizes care as a tripartite interaction among Patient, Provider, and Ecosystem. To formalize how any agent, whether human or artificial, transforms information into clinical action, we develop parallel decision-making architectures for providers, patients, and AI agents, grounded in validated principles of clinical cognition. The Clinical AI Skill-Mix operationalizes competency through eight dimensions. Five define the clinical competency space (condition, phase, care setting, provider role, and task) and three specify how AI engages human reasoning (assigned authority, agent facing, and anchoring layer). The combinatorial product of these dimensions yields a space of billions of distinct competency coordinates. A central structural implication is that validation within one coordinate provides minimal evidence for performance in another, re
Purpose: We investigated the utilization of privacy-preserving, locally-deployed, open-source Large Language Models (LLMs) to extract diagnostic information from free-text cardiovascular magnetic resonance (CMR) reports. Materials and Methods: We evaluated nine open-source LLMs on their ability to identify diagnoses and classify patients into various cardiac diagnostic categories based on descriptive findings in 109 clinical CMR reports. Performance was quantified using standard classification metrics including accuracy, precision, recall, and F1 score. We also employed confusion matrices to examine patterns of misclassification across models. Results: Most open-source LLMs demonstrated exceptional performance in classifying reports into different diagnostic categories. Google's Gemma2 model achieved the highest average F1 score of 0.98, followed by Qwen2.5:32B and DeepseekR1-32B with F1 scores of 0.96 and 0.95, respectively. All other evaluated models attained average scores above 0.93, with Mistral and DeepseekR1-7B being the only exceptions. The top four LLMs outperformed our board-certified cardiologist (F1 score of 0.94) across all evaluation metrics in analyzing CMR reports.
Clinical Pathways (CP) are medical management plans developed to standardize patient treatment activities, optimize resource usage, reduce expenses, and improve the quality of healthcare services. Most CPs currently in use are paper-based documents (i.e., not computerized). CP computerization has been an active research topic since the inception of CP use in hospitals. This literature review research aims to examine studies that focused on CP computerization and offers recommendations for future research in this important research area. Some critical research suggestions include centralizing computerized CPs in Healthcare Information Systems (HIS), CP term standardization using international medical terminology systems, developing a global CP-specific digital coding system, creating a unified CP meta-ontology, developing independent Clinical Pathway Management Systems (CPMS), and supporting CPMSs with machine learning sub-systems.
Empiric antibiotic prescribing in high-risk clinical contexts often requires decision making under conditions of incomplete information, where inappropriate coverage or unjustified escalation may compromise safety and antimicrobial stewardship. While clinical decision-support systems have been proposed to assist in this process, many approaches lack explicit governance and evaluation mechanisms defining scope, abstention conditions, recommendation permissibility, and expected system behavior. This work specifies a governance and evaluation framework for deterministic clinical decision-support systems operating under explicitly constrained scope. Deterministic behavior is adopted to ensure that identical inputs yield identical outputs, supporting transparency, auditability, and conservative decision support in high-risk prescribing contexts. The framework treats governance as a first-class design component, separating clinical decision logic from rule-based mechanisms that determine whether a recommendation may be issued. Explicit abstention, deterministic stewardship constraints, and exclusion rules are formalized as core constructs. The framework defines an evaluation methodology
We introduce Clinical ModernBERT, a transformer based encoder pretrained on large scale biomedical literature, clinical notes, and medical ontologies, incorporating PubMed abstracts, MIMIC IV clinical data, and medical codes with their textual descriptions. Building on ModernBERT the current state of the art natural language text encoder featuring architectural upgrades such as rotary positional embeddings (RoPE), Flash Attention, and extended context length up to 8,192 tokens our model adapts these innovations specifically for biomedical and clinical domains. Clinical ModernBERT excels at producing semantically rich representations tailored for long context tasks. We validate this both by analyzing its pretrained weights and through empirical evaluation on a comprehensive suite of clinical NLP benchmarks.
Developing AI models that are useful in clinical practice, requires efficient collaboration between clinicians and AI developers. This poses a practical challenge: clinicians must repeatedly communicate and refine their requirements with AI developers before those requirements can be translated into executable model development. This iterative process is time-consuming, and even after repeated discussion, misalignment may still exist because the two sides do not fully share each other's expertise. Coding agents may help close this gap. They can write and refine code on their own, and they carry working knowledge of both medicine and AI to understand commands formulated by both medical experts and developers. We present a prototype that lets clinicians drive AI development directly. A clinician describes the task in plain language, and the system turns the description into a working pipeline, refines it through repeated experiments together with the clinician, and returns a model that meets the stated clinical objective. Across five clinical tasks, the system reliably produces models that matched the clinician's request and reached competitive performance. Most notably, on chest rad
We introduce SoftTiger, a clinical large language model (CLaM) designed as a foundation model for healthcare workflows. The narrative and unstructured nature of clinical notes is a major obstacle for healthcare intelligentization. We address a critical problem of structuring clinical notes into clinical data, according to international interoperability standards. We collect and annotate data for three subtasks, namely, international patient summary, clinical impression and medical encounter. We then supervised fine-tuned a state-of-the-art LLM using public and credentialed clinical data. The training is orchestrated in a way that the target model can first support basic clinical tasks such as abbreviation expansion and temporal information extraction, and then learn to perform more complex downstream clinical tasks. Moreover, we address several modeling challenges in the healthcare context, e.g., extra long context window. Our blind pairwise evaluation shows that SoftTiger outperforms other popular open-source models and GPT-3.5, comparable to Gemini-pro, with a mild gap from GPT-4. We believe that LLMs may become a step-stone towards healthcare digitalization and democratization.
Objectives: Electronic health records (EHRs) are only a first step in capturing and utilizing health-related data - the challenge is turning that data into useful information. Furthermore, EHRs are increasingly likely to include data relating to patient outcomes, functionality such as clinical decision support, and genetic information as well, and, as such, can be seen as repositories of increasingly valuable information about patients' health conditions and responses to treatment over time. Methods: We describe a case study of 423 patients treated by Centerstone within Tennessee and Indiana in which we utilized electronic health record data to generate predictive algorithms of individual patient treatment response. Multiple models were constructed using predictor variables derived from clinical, financial and geographic data. Results: For the 423 patients, 101 deteriorated, 223 improved and in 99 there was no change in clinical condition. Based on modeling of various clinical indicators at baseline, the highest accuracy in predicting individual patient response ranged from 70-72% within the models tested. In terms of individual predictors, the Centerstone Assessment of Recovery Le
Biomedical research encompasses diverse types of activities, from basic science ("bench") to clinical medicine ("bedside") to bench-to-bedside translational research. It, however, remains unclear whether different types of research receive citations at varying rates. Here we aim to answer this question by using a newly proposed paper-level indicator that quantifies the extent to which a paper is basic science or clinical medicine. Applying this measure to 5 million biomedical papers, we find a systematic citation disadvantage of clinical oriented papers; they tend to garner far fewer citations and are less likely to be hit works than papers oriented towards basic science. At the same time, clinical research has a higher variance in its citation. We also find that the citation difference between basic and clinical research decreases, yet still persists, if longer citation-window is used. Given the increasing adoption of short-term, citation-based bibliometric indicators in funding decisions, the under-cited effect of clinical research may provide disincentives for bio-researchers to venture into the translation of basic scientific discoveries into clinical applications, thus providi
Clinical decision-making relies on the integrated analysis of medical images and the associated clinical reports. While Vision-Language Models (VLMs) can offer a unified framework for such tasks, they can exhibit strong biases toward one modality, frequently overlooking critical visual cues in favor of textual information. In this work, we introduce Selective Modality Shifting (SMS), a perturbation-based approach to quantify a model's reliance on each modality in binary classification tasks. By systematically swapping images or text between samples with opposing labels, we expose modality-specific biases. We assess six open-source VLMs-four generalist models and two fine-tuned for medical data-on two medical imaging datasets with distinct modalities: MIMIC-CXR (chest X-ray) and FairVLMed (scanning laser ophthalmoscopy). By assessing model performance and the calibration of every model in both unperturbed and perturbed settings, we reveal a marked dependency on text input, which persists despite the presence of complementary visual information. We also perform a qualitative attention-based analysis which further confirms that image content is often overshadowed by text details. Our
Clinical trials are critical in advancing medical treatments but often suffer from immense time and financial burden. Advances in statistical methodologies and artificial intelligence (AI) present opportunities to address these inefficiencies. Here we introduce Prognostic Covariate-Adjusted Mixed Models for Repeated Measures (PROCOVA-MMRM) as an advantageous combination of prognostic covariate adjustment (PROCOVA) and Mixed Models for Repeated Measures (MMRM). PROCOVA-MMRM utilizes time-matched prognostic scores generated from AI models to enhance the precision of treatment effect estimators for longitudinal continuous outcomes, enabling reductions in sample size and enrollment times. We first provide a description of the background and implementation of PROCOVA-MMRM, followed by two case study reanalyses where we compare the performance of PROCOVA-MMRM versus the unadjusted MMRM. These reanalyses demonstrate significant improvements in statistical power and precision in clinical indications with unmet medical need, specifically Alzheimer's Disease (AD) and Amyotrophic Lateral Sclerosis (ALS). We also explore the potential for sample size reduction with the prospective implementati
Medical report generation aims to automatically produce radiology-style reports from medical images, supporting efficient and accurate clinical decision-making.However, existing approaches predominately rely on token-level likelihood training, which favors local lexical matching and leaves clinical correctness under-specified in the training objective. This behavior can be attributed to token-level likelihood optimization, which rewards surface-form agreement and therefore fails to directly encode constraints on medically accurate findings. To address this objective mismatch, we introduce a semantic-driven reinforcement learning (SRL) framework for medical report generation, named MRG-R1, which directly optimizes report-level clinical correctness rather than token-level likelihood. The key module is a clinically grounded report-level reward function, which reinforces semantic agreement in clinically relevant findings between generated and reference reports, thereby enabling learning signals that explicitly constrain medical correctness beyond surface linguistic alignment. Our evaluations show that the proposed framework improves the accuracy and coverage of clinically relevant find
Objective: Integrating EHR data with other resources is essential in rare disease research due to low disease prevalence. Such integration is dependent on the alignment of ontologies used for data annotation. The International Classification of Diseases (ICD) is used to annotate clinical diagnoses; the Human Phenotype Ontology (HPO) to annotate phenotypes. Although these ontologies overlap in biomedical entities described, the extent to which they are interoperable is unknown. We investigate how well aligned these ontologies are and whether such alignments facilitate EHR data integration. Materials and Methods: We conducted an empirical analysis of the coverage of mappings between ICD and HPO. We interpret this mapping coverage as a proxy for how easily clinical data can be integrated with research ontologies such as HPO. We quantify how exhaustively ICD codes are mapped to HPO by analyzing mappings in the UMLS Metathesaurus. We analyze the proportion of ICD codes mapped to HPO within a real-world EHR dataset. Results and Discussion: Our analysis revealed that only 2.2% of ICD codes have direct mappings to HPO in UMLS. Within our EHR dataset, less than 50% of ICD codes have mapping
This scientometric study analyzes Avian Influenza research from 2014 to 2023 using bibliographic data from the Web of Science database. We examined publication trends, sources, authorship, collaborative networks, document types, and geographical distribution to gain insights into the global research landscape. Results reveal a steady increase in publications, with high contributions from Chinese and American institutions. Journals such as PLoS One and the Journal of Virology published the highest number of studies, indicating their influence in this field. The most prolific institutions include the Chinese Academy of Sciences and the University of Hong Kong, while the College of Veterinary Medicine at South China Agricultural University emerged as the most productive department. China and the USA lead in publication volume, though developed nations like the United Kingdom and Germany exhibit a higher rate of international collaboration. "Articles" are the most common document type, constituting 84.6% of the total, while "Reviews" account for 7.6%. This study provides a comprehensive view of global trends in Avian Influenza research, emphasizing the need for collaborative efforts ac
Automated structured radiology report generation (SRRG) from chest X-ray images offers significant potential to reduce workload of radiologists by generating reports in structured formats that ensure clarity, consistency, and adherence to clinical reporting standards. While radiologists effectively utilize available clinical contexts in their diagnostic reasoning, existing SRRG systems overlook these essential elements. This fundamental gap leads to critical problems including temporal hallucinations when referencing non-existent clinical contexts. To address these limitations, we propose contextualized SRRG (C-SRRG) that comprehensively incorporates rich clinical context for SRRG. We curate C-SRRG dataset by integrating comprehensive clinical context encompassing 1) multi-view X-ray images, 2) clinical indication, 3) imaging techniques, and 4) prior studies with corresponding comparisons based on patient histories. Through extensive benchmarking with state-of-the-art multimodal large language models, we demonstrate that incorporating clinical context with the proposed C-SRRG significantly improves report generation quality. We publicly release dataset, code, and checkpoints to fac
The production of knowledge has become increasingly a global endeavor. Yet, location related factors, such as local working environment and national policy designs, may continue to affect what kind of science is being pursued. Here we examine the geography of the production of creative science by country, through the lens of novelty and atypicality proposed in Uzzi et al. (2013). We quantify a country's representativeness in novel and atypical science, finding persistent differences in propensity to generate creative works, even among developed countries that are large producers in science. We further cluster countries based on how their tendency to publish novel science changes over time, identifying one group of emerging countries. Our analyses point out the recent emergence of China not only as a large producer in science but also as a leader that disproportionately produces more novel and atypical research. Discipline specific analysis indicates that China's over-production of atypical science is limited to a few disciplines, especially its most prolific ones like materials science and chemistry.