Agentic artificial intelligence (AI) systems, characterized by autonomous goal-directed behavior, multi-step reasoning, task decomposition, and tool use, are increasingly proposed for healthcare applications. However, their autonomy raises concerns regarding transparency, accountability, and human oversight. While explainable AI (XAI) has been widely studied in traditional predictive models, less is known about how explainability is implemented within agentic architectures. To map the emerging literature on explainable agentic AI (XAAI) in healthcare and characterize the types, scope, and forms of explainability used in these systems. A scoping review was conducted following PRISMA-ScR guidelines. PubMed, Embase, IEEE Xplore, and ACM Digital Library were searched through November 2025. Eligible studies described healthcare-related agentic AI systems incorporating explicit explainability mechanisms. Data were extracted on system architecture, explainability type (intrinsic, post hoc, hybrid), explanation scope (local, global), explanation form, and reported clinical outcomes. Nine studies met the inclusion criteria. All systems demonstrated core agentic features, including autonomy, task decomposition, and tool integration, often within multi-agent frameworks. Explainability was predominantly intrinsic and workflow-native, typically delivered through textual reasoning traces and example-based grounding in retrieved clinical evidence. Feature-based and global explanations were comparatively rare and largely confined to hybrid architectures. Across domains including radiology, neurology, psychiatry, and biomedical research, XAAI systems were reported to improve performance and interpretability relative to baseline models in the included studies. However, these findings were derived from heterogeneous, predominantly experimental or retrospective studies, and structured human-in-the-loop oversight was infrequently described. Current XAAI systems appear to emphasize process transparency and evidence grounding rather than mechanistic model-level attribution. The available evidence remains limited and heterogeneous, and findings should be interpreted as early trends rather than established characteristics. Further progress will require standardized evaluation frameworks, clearer reporting of oversight mechanisms, and validation in real-world clinical settings to support safe and trustworthy integration of agentic AI into healthcare practice.
Artificial intelligence (AI) tools are shifting from passive, user-initiated tools to proactive agentic AI systems that are capable of autonomous, multi-step actions. These agents can independently gather information, execute sequential tasks, and collaborate with humans or other agents without requiring constant prompting from humans. Early adopters in health care have demonstrated early feasibility across multiple specialties and clinical settings. Dermatology is well-positioned to benefit given its high patient volumes, administrative burdens, and clinicopathological workflows. To guide responsible adoption of agentic AI, we propose a risk-stratification framework based on clinical risk and task reversibility. Barriers to widespread adoption of agentic AI include limitations in model reliability, interoperability across health records, and unresolved questions around liability, privacy, and regulation. Dermatologists must proactively engage via professional organizations and industry partnerships to ensure that agentic AI is developed safely, equitably, and in alignment with our values.
As agentic artificial intelligence systems become increasingly embedded in medical imaging, practice is moving from episodic decision support to workflow-based architectures that alter how practitioners think and practise. Medical imaging practice is traditionally conceptualised using Dual Process Theory, which describes how practitioners use their System 1 (intuitive decision making) and System 2 (analytic decision making) in practice. However, as more practitioners incorporate agentic artificial intelligence systems into their workflow, a Tri-System framework may be required. This Perspective paper will show how the practitioner and an agentic artificial intelligence system become part of a cognitive team known as System 3. It will argue that an appropriate level of cognitive surrender should be considered and that current decision making should be reframed through diagnostic complementarity, with added emphasis on structured human and AI interaction to achieve optimal performance. We recommend the implementation of the following educational methods in radiography programmes: (a) training students using fault-injected medical images to reinforce the importance of human verification in image interpretation; (b) preparing students to supervise the performance of agentic artificial intelligence systems; (c) normalising AI-assisted activities to mitigate potential deskilling.
We present a multi-agentic workflow for critical materials recovery that deploys a series of AI agents and automated instruments to recover critical materials from produced water and magnet leachates. This approach achieves selective precipitation from real-world feedstocks using simple chemicals, accelerating the optimization of efficient, adaptable, and scalable separations to a timeline of days, rather than months and years.
Background Clinical documentation and information retrieval consume over half of physicians' working hours, contributing to cognitive overload and burnout. While artificial intelligence (AI) offers a potential solution, concerns over hallucinations and source reliability have limited adoption at the point of care. This study aimed to evaluate physician-perceived time efficiency, decision-making support, and satisfaction with DR. INFO, an agentic AI clinical assistant, in routine clinical practice. Methodology In this prospective, single-arm, pilot feasibility study, 29 physicians and medical students across multiple specialties in Portuguese healthcare institutions used DR. INFO v1.0 over five working days within a two-week period. Outcomes were assessed via daily Likert-scale evaluations (time saving and decision support) and a final Net Promoter Score (NPS). Non-parametric methods were used throughout, with bootstrap confidence intervals (CIs) and sensitivity analysis to address non-response. Results Physicians reported high perceived time saving (mean = 4.27/5; 95% CI = 3.97-4.57) and decision support (mean = 4.16/5; 95% CI = 3.86-4.45), with ratings stable across the five-day study window. Among the 16 (55%) participants who completed the final evaluation, the NPS was 81.2, with no detractors; sensitivity analysis indicated an NPS of 44.8 under conservative non-response assumptions. Conclusions Physicians across specialties and career stages reported positive perceptions of DR. INFO for both time efficiency and clinical decision support within the study window. These findings are preliminary and should be confirmed in larger, controlled studies that include objective performance measures and independent accuracy verification.
Nanopore sequencing has enabled various layers of information about DNA and RNA sequence isoforms and chemical modifications. Yet, the archipelago of disjoint nanopore analysis tools makes navigating among these a significant challenge for the nanopore user. We present NanoCortex, a unified autonomous agentic framework designed to bridge this shortcoming by providing end-to-end data processing which ranges from raw signal basecalling to biological interpretation. Built upon Gemini API services that incur usage-based API costs and orchestrated through the Gemini Agent Development Kit (ADK), the system utilizes a multi-agent architecture to autonomously perform task parsing, code generation, iterative code-level self-correction of code, and scientific interpretation. Following code generation, the code can be used offline. Benchmarking reveals that NanoCortex achieves significantly higher usability across complex analytical tasks compared to general-purpose large language models. The framework seamlessly integrates experimental data with meta-analysis of publicly available, biological databases to facilitate the extraction of biologically meaningful insights from sequencing data without cumbersome computational steps.
Agentic AI systems integrate foundation models, prompt templates, tool connectors, orchestration logic, and containerised dependencies, creating exploitability conditions that cannot be inferred from static Software Bills of Materials (SBOMs). Artificial Intelligence Bills of Materials (AIBOM) extend transparency to AI-specific artefacts, yet current CSAF/VEX workflows remain based on static component-CVE correlation without runtime validation. A protocol-driven framework is presented that binds SBOM and AIBOM artefacts to deterministic environment capture and structured runtime telemetry. Exploitability is computed from declared artefacts, observed activation conditions, and enforced execution policies. CSAF-VEX advisories are generated from combined static and runtime evidence, cryptographically signed, and validated through deterministic replay. Evaluation uses approximately 10,000 component entries across synthetic Agentic AI workloads (50-5,000 components), incorporating OSV, GitHub Advisory, KEV, and EPSS datasets. Under controlled experimental conditions, the framework achieves an F1-score of 0.93 (precision 0.96, recall 0.92), reduces false positives by up to 42% relative to static SBOM-CVE matching without runtime validation, and alters exploitability outcomes in 31% of AI-specific artefact cases through AIBOM extension. Advisory artefacts remain reproducible under deterministic replay. Binding AIBOM artefacts to runtime telemetry transforms CSAF-VEX generation from static disclosure into execution-grounded exploitability assessment for Agentic AI supply chains.
Spatial transcriptomics and proteomics map tissue architecture and cellular interactions, but analysis remains limited by programming demands and text-centered AI agents that lack viewer grounding and cross-turn context. We present spatiAlytica, a viewer-centric multimodal interactive agentic system embedded in the Napari viewer that enables non-programmer biologists to perform iterative, hypothesis-driven spatial omics analysis via natural language. spatiAlytica couples viewer-state serialization, agentic memory, biological concept-to-data-field mapping, code generation and debugging, Spatial VQA, and grounded interpretation to support an exploratory analysis and interpretive reasoning workflow. We introduce spatiAlyticaBench, a comprehensive benchmark spanning 222 single-turn spatial analytical coding questions, 178 multi-turn sequential workflow questions, and 7,350 image-grounded reasoning questions. spatiAlytica outperformed strong agentic baselines, while using less time and tokens. Case studies across Kaposi's sarcoma, colorectal cancer, and ovarian cancer recapitulated known spatial patterns and uncovered progressive CD8 T-cell dysfunction during KS progression.
Successfully adapting to life in the highest altitudes ("Roof of the world") is a heritage of evolutionary adaptation for humans. With rising interest in adventure travel and expanding transport networks that facilitate mobility from low to high altitudes, provision of healthcare for populations living in high-altitude regions has re-emerged as an area of interest and research. These populations have several unique characteristics that limit the simple generalization of medical knowledge. First, these populations are naturally segregated into distinct ethnic groups, representing a unique marginal demographic. Second, the harsh natural environment, underdeveloped healthcare infrastructure, and limited research and understanding of healthcare needs, issues and challenges experienced by highland communities pose significant barriers to equitable healthcare access. The use of medical artificial intelligence and digital technology provides an opportunity to provide innovative solutions for these populations. However, these technologies would not facilitate health equity in their current state today as most are narrow in their application, are not trained on data representative of these regions, and ignore the multifactorial nature of being healthy that combines biological and physiological factors, in addition to environmental and socio factors. The success of generalist models for tasks such as scientific discovery provides a mechanism to leapfrog existing challenges and provide equitable care in these regions. In this paper, we discuss the opportunity of intelligent medical agents developed on generalist foundation models to meet the unique needs of high-altitude populations.
Variant interpretation in rare diseases requires navigating multiple genomic databases, each with strict input formats, while synthesizing heterogeneous evidence. This process creates significant barriers for non-experts and imposes a substantial cognitive burden on experienced specialists. These challenges are evident in tools such as model organism aggregated resources for rare variant exploration (MARRVEL), which require precise variant formatting (e.g., Human Genome Variation Society [HGVS] notation) and return complex, heterogeneous outputs. To address these usability barriers, we developed MARRVEL-MCP, a natural-language interface that enables large language models (LLMs) to perform end-to-end variant interpretation via structured tool access. This work demonstrates the impact of tool-augmented context engineering, the purposeful design of domain-aware tool environments and structured information scaffolding through executable function interfaces, on reshaping the role of model scale in genomics. MARRVEL-MCP equips LLMs with 44 tools spanning gene and variant utilities, pathogenicity databases, phenotype resources, expression atlases, ortholog data, and literature APIs. Without hard-coded workflows, LLMs infer which tools to invoke and in what sequence, performing named-entity recognition, identifier normalization, and multi-database synthesis from clinical queries. Using 100 expert-curated questions, lightweight models (3B-20B parameters) with MARRVEL-MCP matched or outperformed larger models without tool access. A 20B-parameter model (gpt-oss-20b) achieved a 94% pass rate, versus 41% without MARRVEL-MCP, approaching state-of-the-art proprietary performance. Although expert oversight remains essential and tool use adds cost, these results show that contextual guidance can compensate for limited model capacity. These findings establish context engineering as a core principle for biomedical AI and support scalable integration of LLMs with curated genomic resources.
Egocentric videos are inherently long-form, as they provide a continuous, first-person perspective of daily life, capturing complex social interactions and routines that naturally span days or weeks. Understanding and reasoning over egocentric videos that span hours or even days poses significant challenges due to their length, multimodal nature, and complex temporal dependencies over long time horizons. To this end, we introduce Ego-R1, a novel framework for reasoning over ultra-long (i.e., days and weeks) egocentric videos. Ego-R1 leverages a structured Chain-of-Tool-Thought (CoTT) process, orchestrated by an Ego-R1 Agent trained via reinforcement learning (RL). Inspired by human problem-solving strategies, CoTT decomposes complex reasoning into modular steps, empowering the agent to act as a high-level controller that dynamically invokes specialized tools-such as hierarchical memory retrievers and multimodal perceptors-to iteratively and collaboratively answer sub-questions. This approach enables effective temporal abstraction, long-horizon dependency tracking, and step-by-step multimodal reasoning. The framework is built upon a flexible toolkit designed for efficient temporal retrieval and granular visual analysis: Hierarchical RAG (H-RAG), a text-based module that performs efficient top-down temporal localization by aggregating video logs from day-level summaries down to 10-minute intervals; Video-LLM, a short-horizon perception module that analyzes local temporal windows to interpret dynamic interactions; and VLM, a fine-grained vision-language model used to extract high-resolution details, such as text or object attributes, from specific frames. We design a two-stage training paradigm involving supervised fine-tuning (SFT) of a pretrained language model using CoTT data, to enable dynamic tool proposal for long-range reasoning; followed by RL, to enhance the performance of plan smartly with tools. To facilitate training, we construct Ego-R1 Data, which consists of Ego-CoTT-25 K for SFT and Ego-QA-4.4 K for RL. Furthermore, we evaluate Ego-R1 on a newly curated week-long video QA benchmark, Ego-R1 Bench, which contains hybrid-source, human-verified QA pairs. Extensive experiments show that our 3B-parameter Ego-R1 Agent achieves the strongest performance among open-weight and tool-agent baselines, while offering interpretable tool-grounded reasoning trajectories. On Ego-R1 Bench, Ego-R1 achieves 46.0% accuracy, substantially outperforming Gemini-1.5-Pro (38.3%) and LLaVA-Video (29.0%); we further report Gemini-3.1-Pro as a stronger closed-source reference at 53.7%. Moreover, the framework exhibits strong generalization to standard exocentric video benchmarks; by leveraging the long-video nature of egocentric data to train the orchestrator's planning capabilities rather than overfitting the perceptors to a specific view, our modular design remains robust across domains. Ego-R1 Agent achieves 64.9% accuracy on Video-MME (long), surpassing leading open-weight models. These results validate that dynamic, tool-augmented reasoning effectively bridges the gap between limited context windows and the demands of understanding both week-long first-person experiences and general long-form video content.
Contemporary clinical practice still produces unstructured data like free-text reports or scans, hindering automated interpretation by knowledge-based clinical decision support (CDS) systems that rely on structured data. Large language models (LLMs) show potential for interpreting such findings but face challenges in accuracy, infrastructure demands, and data privacy. Integrating LLMs with modular knowledge-based CDS systems could provide validated interpretations of such findings, but models need to call CDS modules with perfectly accurate parameters. The accuracy of multiple size classes of LLMs calling Arden Syntax Medical Logic Modules for hepatitis serology interpretation of varying complexity from unstructured multi-modal inputs is tested using a novel framework. Computationally lean LLMs like GPT-OSS were found to handle a small amount of low-complexity parameters with high accuracy, approaching clinical feasibility for private and reliable CDS interpretation of multi-modal data. Accuracy decreased sharply for tools involving more numerous or complex quantitative parameters.
暂无摘要(点击查看详情)
Agentic tools - software environments where a large language model plans, calls external tools, executes code, and iterates with minimal human intervention - will run a substantial share of routine biomedical data analysis within the next few years. However, per-call inference cost on frontier models is the bottleneck and can add up quickly. Here, we tested whether a free, locally-runnable open-weight model could take over the repetitive execution steps at frontier accuracy. We used Claude's Opus to author plans of increasing detail for per-sample variant calling, and ran six 2026-release open-weight implementer LLMs against those plans on a set of desktop GPUs. qwen3.6:27b reproduced frontier accuracy on every plan and matched Opus cell-for-cell on a 36-cell error-injection matrix. A sub-$2,000 Jetson or Apple Mac Mini sufficed for the implementer side. The open-weight model landscape evolves on the order of months, so the specific implementer recommended here will be superseded; we provide the plans, harness, scoring code, and per-cell artifacts at https://github.com/nekrut/LLM-eval-paper as a framework for re-evaluating future models.
We present an agentic workflow that converts heterogeneous safety evidence into concise, reproducible drug summaries. While automated FAERS summarization, retrieval-augmented generation, and tool-driven agents exist in isolation, our contribution lies in their integration within a schema-aware, deterministic pipeline with explicit versioning and pharmacokinetic contextualization. The system queries FAERS via OpenFDA, integrates curated cytochrome P450 mappings, and can retrieve recent PubMed records. It normalizes fields, computes predefined aggregates, assesses enzyme overlap between index drugs and frequent co-medications, and generates constrained narratives and figures directly from computed tables. Applied to 110 drugs, the workflow recovered clear cross-drug patterns in severe outcomes and identified per-drug leaders for death and hospitalization. Case examples for clopidogrel and voriconazole illustrate how co-reporting patterns combined with CYP context provide mechanistic framing without implying causality. Deterministic execution, versioned queries, and cached responses enable exact reruns and audit. The workflow produces structured safety briefs that support safety committee review, early signal triage, and the selection of targets for confirmatory pharmacoepidemiologic studies.
暂无摘要(点击查看详情)
暂无摘要(点击查看详情)
Bullying, a significant global issue detrimental to student well-being, is increasingly understood as a goal-directed strategy within power-imbalanced contexts. This study investigates the relationships among agentic goals, two resource control strategies (coercive and prosocial), and bullying behaviors. A sample of 1,000 Chinese adolescents (Mage = 13.6 years) completed measures of agentic goals, resource control strategies, and bullying behavior. Adopting a person-oriented approach, we first used latent profile analysis (LPA) on prosocial strategy scores to identify heterogeneous subgroups. Subsequently, variable-oriented moderated mediation models were examined within each subgroup. LPA delineated two distinct subgroups: a High Prosocial Orientation Group (73.8%) and a Low Prosocial Orientation Group (26.2%). Across the sample, agentic goals were positively associated with bullying, mediated by coercive strategies. The critical finding was that prosocial strategies moderated this mediation pathway; however, this moderated mediation effect was significant only within the High Prosocial Orientation Group. This study supports a nonpathological, goal-oriented framework for understanding bullying. The findings reveal that the protective role of prosocial strategies is conditional, effectively moderating the harmful pathway from agentic goals to bullying only among adolescents who already possess a high baseline level of such competence. This underscores the importance of interventions that address underlying motivational goals and promote prosocial skills, while also highlighting the potential need for differentiated approaches based on individuals' existing strategic repertoires.
The Coronavirus Disease 2019 (COVID-19) pandemic has highlighted the significance of reliable molecular biomarkers in clinical use. Despite the popularity of traditional statistical approaches, the high dimensionality of transcriptomic data presents challenges for these conventional methods. While artificial intelligence (AI) algorithms have emerged as highly advantageous for handling these complex datasets, there is a lack of evaluation of these approaches in COVID-19 transcriptomic studies. This review aims to provide an evaluation of these studies employed for transcriptomic biomarker discovery in COVID-19 using AI, assessing their study designs, methodologies, and outcomes. Based on a comprehensive search for literature across five databases including Web of Science Core Collection, Scopus, PubMed/MEDLINE, IEEE Xplore Digital Library, and LitCovid from December 2019 to March 2025, this review selected 63 studies for a narrative synthesis of four key sections: (i) The Landscape of AI-Driven COVID-19 Transcriptomics, (ii) Limitations of Studies, (iii) A Proposed AI-Driven Transcriptomics Framework, and (iv) Clinical Translation Challenges, Opportunities, and Future Directions. Our analysis revealed limitations in data quality, sample size, and heterogeneity, as well as methodologies regarding validation and interpretability. Thus, we proposed an evidence-informed workflow that addresses these current limitations in study design, while acknowledging real-world constraints. We further discuss the emerging potential of agentic AI systems as a promising solution to current limitations. By bridging methodological gaps with translation considerations, this review can enhance pandemic response strategies for future emerging infectious diseases. Key Points Applications observed in reviewed studies mainly included applications in diagnosis and severity stratification of COVID-19 patients. The limitations of current studies included small sample sizes, the reliance on public datasets lacking detailed metadata, batch effects and data heterogeneity reducing model robustness, the lack of external validation, risks of data leakage and circular validation leading to inflated performance metrics, and challenges in model interpretability. An evidence-informed AI-driven framework is proposed, acknowledging real-world constraints including small pandemic cohort sizes, domain shift from viral evolution, and resource-limited settings, with emerging agentic AI systems offering potential solutions.