Background and Objectives: Clinical Practice Guidelines (CPGs) represent the foremost methodology for sharing state-of-the-art research findings in the healthcare domain with medical practitioners to limit practice variations, reduce clinical cost, improve the quality of care, and provide evidence based treatment. However, extracting relevant knowledge from the plethora of CPGs is not feasible for already burdened healthcare professionals, leading to large gaps between clinical findings and real practices. It is therefore imperative that state-of-the-art Computing research, especially machine learning is used to provide artificial intelligence based solution for extracting the knowledge from CPGs and reducing the gap between healthcare research/guidelines and practice. Methods: This research presents a novel methodology for knowledge extraction from CPGs to reduce the gap and turn the latest research findings into clinical practice. First, our system classifies the CPG sentences into four classes such as condition-action, condition-consequences, action, and not-applicable based on the information presented in a sentence. We use deep learning with state-of-the-art word embedding, im
Biomedical research encompasses diverse types of activities, from basic science ("bench") to clinical medicine ("bedside") to bench-to-bedside translational research. It, however, remains unclear whether different types of research receive citations at varying rates. Here we aim to answer this question by using a newly proposed paper-level indicator that quantifies the extent to which a paper is basic science or clinical medicine. Applying this measure to 5 million biomedical papers, we find a systematic citation disadvantage of clinical oriented papers; they tend to garner far fewer citations and are less likely to be hit works than papers oriented towards basic science. At the same time, clinical research has a higher variance in its citation. We also find that the citation difference between basic and clinical research decreases, yet still persists, if longer citation-window is used. Given the increasing adoption of short-term, citation-based bibliometric indicators in funding decisions, the under-cited effect of clinical research may provide disincentives for bio-researchers to venture into the translation of basic scientific discoveries into clinical applications, thus providi
Clinical transfusion-outcomes research faces unique methodological challenges compared with other areas of clinical research. These challenges arise because patients frequently receive multiple transfusions, each unit originates from a different donor, and the probability of receiving specific blood product characteristics is influenced by external, often uncontrollable, factors. These complexities complicate causal inference in observational studies of transfusion effectiveness and safety. This guide addresses key challenges in observational transfusion research, with a focus on time-varying exposure, time-varying confounding, and treatment-confounder feedback. Using the example of donor sex and pregnancy history in relation to recipient mortality, we illustrate the strengths and limitations of commonly used analytical approaches. We compare restriction-based analyses, time-varying Cox regression, and inverse probability weighted marginal structural models using a large observational dataset of male transfusion recipients. In the applied example, restriction and conventional time-varying approaches suggested an increased mortality risk associated with transfusion of red blood cell
Obesity is defined as the excessive accumulation or abnormal distribution of body fat. According to data from World Obesity Atlas 2024, the increase in prevalence of obesity has become a major worldwide health problem in adults as well as among children and adolescents. Although an increasing number of drugs have been approved for the treatment of obesity in recent years, many of these drugs have inevitable side effects which have increased the demand for new safe, accessible and effective drugs for obesity and prompt interest in natural products. Berberine (BBR) and its metabolites, known for their multiple pharmacological effects. Recent studies have emphatically highlighted the anti-obesity benefits of BBR and the underlying mechanisms have been gradually elucidated. However, its clinical application is limited by poor oral absorption and low bioavailability. Based on this, this review summarizes current research on the anti-obesity effects of BBR and its metabolites, including advancements in clinical trail results, understanding potential molecular mechanisms and absorption and bioavailability. As a natural compound derived from plants, BBR holds potential as an alternative ap
The competency of any intelligent agent is bounded by its formal account of the world in which it operates. Clinical AI lacks such an account. Existing frameworks address evaluation, regulation, or system design in isolation, without a shared model of the clinical world to connect them. We introduce the Clinical World Model, a framework that formalizes care as a tripartite interaction among Patient, Provider, and Ecosystem. To formalize how any agent, whether human or artificial, transforms information into clinical action, we develop parallel decision-making architectures for providers, patients, and AI agents, grounded in validated principles of clinical cognition. The Clinical AI Skill-Mix operationalizes competency through eight dimensions. Five define the clinical competency space (condition, phase, care setting, provider role, and task) and three specify how AI engages human reasoning (assigned authority, agent facing, and anchoring layer). The combinatorial product of these dimensions yields a space of billions of distinct competency coordinates. A central structural implication is that validation within one coordinate provides minimal evidence for performance in another, re
Objectives: Electronic health records (EHRs) are only a first step in capturing and utilizing health-related data - the challenge is turning that data into useful information. Furthermore, EHRs are increasingly likely to include data relating to patient outcomes, functionality such as clinical decision support, and genetic information as well, and, as such, can be seen as repositories of increasingly valuable information about patients' health conditions and responses to treatment over time. Methods: We describe a case study of 423 patients treated by Centerstone within Tennessee and Indiana in which we utilized electronic health record data to generate predictive algorithms of individual patient treatment response. Multiple models were constructed using predictor variables derived from clinical, financial and geographic data. Results: For the 423 patients, 101 deteriorated, 223 improved and in 99 there was no change in clinical condition. Based on modeling of various clinical indicators at baseline, the highest accuracy in predicting individual patient response ranged from 70-72% within the models tested. In terms of individual predictors, the Centerstone Assessment of Recovery Le
Despite obesity being widely discussed in the social sciences, the effect of a robot's perceived obesity level on trust is not covered by the field of HRI. While in research regarding humans, Body Mass Index (BMI) is commonly used as an indicator of obesity, this scale is completely irrelevant in the context of robots, so it is challenging to operationalize the perceived obesity level of robots; indeed, while the effect of robot's size (or height) on people's trust in it was addressed in previous HRI papers, the perceived obesity level factor has not been addressed. This work examines to what extent the perceived obesity level of humanoid robots affects people's trust in them. To test this hypothesis, we conducted a within-subjects study where, using an online pre-validated questionnaire, the subjects were asked questions while being presented with two pictures of humanoids, one with a regular obesity level and the other with a high obesity level. The results show that humanoid robots with lower perceived obesity levels are significantly more likely to be trusted.
Demographic data collection is essential in education research, as demographic data allows researchers to better describe the participant population they study and to contextualize findings. However, current research practices for neurodiversity demographics often rely on prescriptive methods (e.g., requiring participants to report official diagnoses) rather than allowing participants to self-identify. This approach can: a) not allow participants to express their intersecting identities in ways that are authentic; and b) limit trustworthiness and reliability of the data and interpretation. In addition, inconsistent dissemination and representation of demographic data across studies hinder the accessibility and usability of this work. Through a literature review of neurodivergent student experiences with learning and performing STEM, we identified widespread discrepancies in how demographic information is collected and reported. This paper explores how neurodivergent identities can be more accurately and inclusively represented in education research. We present findings of a thematic analysis on the ways neurodivergent demographic data collection is done in the literature using data
Clinical Pathways (CP) are medical management plans developed to standardize patient treatment activities, optimize resource usage, reduce expenses, and improve the quality of healthcare services. Most CPs currently in use are paper-based documents (i.e., not computerized). CP computerization has been an active research topic since the inception of CP use in hospitals. This literature review research aims to examine studies that focused on CP computerization and offers recommendations for future research in this important research area. Some critical research suggestions include centralizing computerized CPs in Healthcare Information Systems (HIS), CP term standardization using international medical terminology systems, developing a global CP-specific digital coding system, creating a unified CP meta-ontology, developing independent Clinical Pathway Management Systems (CPMS), and supporting CPMSs with machine learning sub-systems.
It has been suggested that bibliometric analysis of different document types may reveal new aspects of research performance. In medical research a number of study types play different roles in the research process and it has been shown, that the evidence-level of study types is associated with varying citation rates. This study focuses on clinical practice guidelines, which are supposed to gather the highest evidence on a given topic to give the best possible recommendation for practitioners. The quality of clinical practice guidelines, measured using the AGREE score, is compared to the citations given to the references used in these guidelines, as it is hypothesised, that better guidelines are based on higher cited references. AGREE scores are gathered from reviews of clinical practice guidelines on a number of diseases and treatments. Their references are collected from Web of Science and citation counts are normalised using the item-oriented z-score and the PPtop-10% indicators. A positive correlation between both citation indicators and the AGREE score of clinical practice guidelines is found. Some potential confounding factors are identified. While confounding cannot be exclud
Clinical NLP increasingly relies on electronic health record (EHR) data to detect suicidal behaviors, treating clinical documentation as more reliable ground truth than social media. We argue that this framing obscures how EHR-based suicidality datasets encode a particular operationalization of suicidality, shaped by who authors the data, how episodes are bounded, and how ambiguity is resolved. We ground this argument in a case study of the ScAN dataset, built over MIMIC-III clinical notes. We show how governance constraints, ICD-based cohort selection, single-annotator labeling, and hospital-stay-level aggregation produce labels that reflect clinician-documented judgments, treat suicidality as a bounded episode, and assume that intent can be reliably inferred from documentation. A linguistic analysis demonstrates that identical labels subsume heterogeneous clinical framings differing in temporality, negation, and uncertainty. We argue that clinical NLP should examine the assumptions embedded in suicidality datasets before interpreting their labels as ground truth.
Empiric antibiotic prescribing in high-risk clinical contexts often requires decision making under conditions of incomplete information, where inappropriate coverage or unjustified escalation may compromise safety and antimicrobial stewardship. While clinical decision-support systems have been proposed to assist in this process, many approaches lack explicit governance and evaluation mechanisms defining scope, abstention conditions, recommendation permissibility, and expected system behavior. This work specifies a governance and evaluation framework for deterministic clinical decision-support systems operating under explicitly constrained scope. Deterministic behavior is adopted to ensure that identical inputs yield identical outputs, supporting transparency, auditability, and conservative decision support in high-risk prescribing contexts. The framework treats governance as a first-class design component, separating clinical decision logic from rule-based mechanisms that determine whether a recommendation may be issued. Explicit abstention, deterministic stewardship constraints, and exclusion rules are formalized as core constructs. The framework defines an evaluation methodology
Developing AI models that are useful in clinical practice, requires efficient collaboration between clinicians and AI developers. This poses a practical challenge: clinicians must repeatedly communicate and refine their requirements with AI developers before those requirements can be translated into executable model development. This iterative process is time-consuming, and even after repeated discussion, misalignment may still exist because the two sides do not fully share each other's expertise. Coding agents may help close this gap. They can write and refine code on their own, and they carry working knowledge of both medicine and AI to understand commands formulated by both medical experts and developers. We present a prototype that lets clinicians drive AI development directly. A clinician describes the task in plain language, and the system turns the description into a working pipeline, refines it through repeated experiments together with the clinician, and returns a model that meets the stated clinical objective. Across five clinical tasks, the system reliably produces models that matched the clinician's request and reached competitive performance. Most notably, on chest rad
Evidence-based practice (EBP) in software engineering aims to improve decision-making in software development by complementing practitioners' professional judgment with high-quality evidence from research. We believe the use of EBP techniques may be helpful for research software engineers (RSEs) in their work to bring software engineering best practices to scientific software development. In this study, we present an experience report on the use of a particular EBP technique, rapid reviews, within an RSE team at Sandia National Laboratories, and present practical recommendations for how to address barriers to EBP adoption within the RSE community.
Introduction: Semantic search, which retrieves documents based on conceptual similarity rather than keyword matching, offers substantial advantages for retrieval of clinical information. However, deploying semantic search across entire health systems, comprising hundreds of millions of clinical notes, presents formidable engineering, cost, and governance challenges that have prevented adoption. Methods: We deployed a semantic search system at a large children's hospital indexing 166 million clinical notes (484 million vectors) from 1.68 million patients. The system uses instruction-tuned qwen3-embedding-0.6B embeddings, stores vectors in a managed database with storage-optimized indexing, maintains full-text metadata in a low-latency key-value store, and operates within a HIPAA-compliant governance framework. We evaluated the system through three experiments: optimization of embedding model and chunking strategy using a physician-authored benchmark dataset, characterization of full-scale performance (cost, latency, retrieval quality), and clinical utility assessment via comparison of chart abstraction efficiency across three tasks. Results: The system delivers sub-second query late
Authorship ethics is a central topic of discussion in research ethics fora. There are various guidelines for authorship (i.e., naming and order). It is not easy to decide the authorship in the presence of varying authorship guidelines. This paper gives an overview of research on authorship practices and issues. It presents a review of 16 empirical research papers published between 2014 -- 2020. The objective is to learn how various research disciplines handle authorship. What are the authorship practices in various research disciplines, and what are the issues associated with these practices?
We introduce Clinical ModernBERT, a transformer based encoder pretrained on large scale biomedical literature, clinical notes, and medical ontologies, incorporating PubMed abstracts, MIMIC IV clinical data, and medical codes with their textual descriptions. Building on ModernBERT the current state of the art natural language text encoder featuring architectural upgrades such as rotary positional embeddings (RoPE), Flash Attention, and extended context length up to 8,192 tokens our model adapts these innovations specifically for biomedical and clinical domains. Clinical ModernBERT excels at producing semantically rich representations tailored for long context tasks. We validate this both by analyzing its pretrained weights and through empirical evaluation on a comprehensive suite of clinical NLP benchmarks.
We introduce SoftTiger, a clinical large language model (CLaM) designed as a foundation model for healthcare workflows. The narrative and unstructured nature of clinical notes is a major obstacle for healthcare intelligentization. We address a critical problem of structuring clinical notes into clinical data, according to international interoperability standards. We collect and annotate data for three subtasks, namely, international patient summary, clinical impression and medical encounter. We then supervised fine-tuned a state-of-the-art LLM using public and credentialed clinical data. The training is orchestrated in a way that the target model can first support basic clinical tasks such as abbreviation expansion and temporal information extraction, and then learn to perform more complex downstream clinical tasks. Moreover, we address several modeling challenges in the healthcare context, e.g., extra long context window. Our blind pairwise evaluation shows that SoftTiger outperforms other popular open-source models and GPT-3.5, comparable to Gemini-pro, with a mild gap from GPT-4. We believe that LLMs may become a step-stone towards healthcare digitalization and democratization.
We evaluate the impact of large language model-based clinical decision support in live care. In partnership with Penda Health, a network of primary care clinics in Nairobi, Kenya, we studied AI Consult, a tool that serves as a safety net for clinicians by identifying potential documentation and clinical decision-making errors. AI Consult integrates into clinician workflows, activating only when needed and preserving clinician autonomy. We conducted a quality improvement study, comparing outcomes for 39,849 patient visits performed by clinicians with or without access to AI Consult across 15 clinics. Visits were rated by independent physicians to identify clinical errors. Clinicians with access to AI Consult made relatively fewer errors: 16% fewer diagnostic errors and 13% fewer treatment errors. In absolute terms, the introduction of AI Consult would avert diagnostic errors in 22,000 visits and treatment errors in 29,000 visits annually at Penda alone. In a survey of clinicians with AI Consult, all clinicians said that AI Consult improved the quality of care they delivered, with 75% saying the effect was "substantial". These results required a clinical workflow-aligned AI Consult i
This paper is dedicated to the design and evaluation of the first AMR parser tailored for clinical notes. Our objective was to facilitate the precise transformation of the clinical notes into structured AMR expressions, thereby enhancing the interpretability and usability of clinical text data at scale. Leveraging the colon cancer dataset from the Temporal Histories of Your Medical Events (THYME) corpus, we adapted a state-of-the-art AMR parser utilizing continuous training. Our approach incorporates data augmentation techniques to enhance the accuracy of AMR structure predictions. Notably, through this learning strategy, our parser achieved an impressive F1 score of 88% on the THYME corpus's colon cancer dataset. Moreover, our research delved into the efficacy of data required for domain adaptation within the realm of clinical notes, presenting domain adaptation data requirements for AMR parsing. This exploration not only underscores the parser's robust performance but also highlights its potential in facilitating a deeper understanding of clinical narratives through structured semantic representations.