Human experts are often engaged in the development of machine learning systems to collect and validate data, consult on algorithm development, and evaluate system performance. At the same time, who counts as an 'expert' and what constitutes 'expertise' is not always explicitly defined. In this work, we review 112 academic publications that explicitly reference 'expert' and 'expertise' and that describe the development of machine learning (ML) systems to survey how expertise is characterized and the role experts play. We find that expertise is often undefined and forms of knowledge outside of formal education and professional certification are rarely sought, which has implications for the kinds of knowledge that are recognized and legitimized in ML development. Moreover, we find that expert knowledge tends to be utilized in ways focused on mining textbook knowledge, such as through data annotation. We discuss the ways experts are engaged in ML development in relation to deskilling, the social construction of expertise, and implications for responsible AI development. We point to a need for reflection and specificity in justifications of domain expert engagement, both as a matter of
Large language models can often close proof gaps in interactive theorem provers, but a verified theorem is not the same thing as a reusable library contribution. We study this distinction through a detailed case study: a semi-autonomous formalization of Grothendieck's vanishing theorem. The initial version compiles with no sorries, but an expert review found serious problems in definitions, theorem generality, file organization, and the API. We then ran a review-driven refactor and compression process and obtained a second expert review. The before-and-after comparison shows a sharp split: agents adapted well to local, mechanically checkable feedback, but remained weak at choosing definitions and designing APIs. We argue that autoformalization should be evaluated not only by closed sorries, but by whether the resulting formalization survives expert review.
LLM-generated reviews for scientific papers are gaining considerable traction and are even being officially piloted by major conferences. We have to assume that not only reviewers are using LLM-assistance, but also that authors use LLMs to revise their papers before submitting. In this work, we perform empirical experiments on papers from the 2025 ACL Rolling Review (ARR) to evaluate LLM reviews from both the author and the reviewer perspective. First, we identify a limited alignment of LLM reviews with human ones. In the best-case scenario, the alignment is reasonable. However, we also find that LLM-human alignment varies substantially across prompts and models. Finally, we investigate the scenario in which the author uses an iterative draft-revise workflow to improve the submission according to the LLM review. We find that this "gaming" of LLM reviews can be effective in specific scenarios, leading to a statistically significant increase of overall scores for up to 35\% of papers. We publish our code: https://github.com/uhh-hcds/reviewarcade.
Chimeric Antigen Receptor (CAR) T-cell therapy is an immunotherapy that has recently become highly instrumental in the fight against life-threatening diseases. A variety of modeling and computational simulation efforts have addressed different aspects of CAR T therapy, including T-cell activation, T- and malignant cell population dynamics, therapeutic cost-effectiveness strategies, and patient survival analyses. In this article, we present a systematic review of those efforts, including mathematical, statistical, and stochastic models employing a wide range of algorithms, from differential equations to machine learning. To the best of our knowledge, this is the first review of all such models studying CAR T therapy. In this review, we provide a detailed summary of the strengths, limitations, methodology, data used, and data lacking in current published models. This information may help in designing and building better models for enhanced prediction and assessment of the benefit-risk balance associated with novel CAR T therapies, as well as with the data collection essential for building such models.
By increasing model parameters but activating them sparsely when performing a task, the use of Mixture-of-Experts (MoE) architecture significantly improves the performance of Large Language Models (LLMs) without increasing the inference cost. However, the memory consumption due to the growing number of experts presents a challenge to the deployment of these models in many real world settings. Our empirical study reveals that some experts encode redundant knowledge during pre-training. We thus propose a method of grouping and pruning similar experts to improve the model's parameter efficiency. We validate the effectiveness of our method by pruning three state-of-the-art MoE architectures, including Mixtral, Deepseek-MoE, and Qwen. The evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks. We will release our code to facilitate future research.
A computer Program Capable of performing at a human-expert level in a narrow problem domain area is called an expert system. Management of uncertainty is an intrinsically important issue in the design of expert systems because much of the information in the knowledge base of a typical expert system is imprecise, incomplete or not totally reliable. In this paper, the author present s the review of past work that has been carried out by various researchers based on development of expert systems for the diagnosis of cardiac disease
Cardiovascular events, such as heart attacks and strokes, remain a leading cause of mortality globally, necessitating meticulous monitoring and adjudication in clinical trials. This process, traditionally performed manually by clinical experts, is time-consuming, resource-intensive, and prone to inter-reviewer variability, potentially introducing bias and hindering trial progress. This study addresses these critical limitations by presenting a novel framework for automating the adjudication of cardiovascular events in clinical trials using Large Language Models (LLMs). We developed a two-stage approach: first, employing an LLM-based pipeline for event information extraction from unstructured clinical data and second, using an LLM-based adjudication process guided by a Tree of Thoughts approach and clinical endpoint committee (CEC) guidelines. Using cardiovascular event-specific clinical trial data, the framework achieved an F1-score of 0.82 for event extraction and an accuracy of 0.68 for adjudication. Furthermore, we introduce the CLEART score, a novel, automated metric specifically designed for evaluating the quality of AI-generated clinical reasoning in adjudicating cardiovascul
Sparse expert models are a thirty-year old concept re-emerging as a popular architecture in deep learning. This class of architecture encompasses Mixture-of-Experts, Switch Transformers, Routing Networks, BASE layers, and others, all with the unifying idea that each example is acted on by a subset of the parameters. By doing so, the degree of sparsity decouples the parameter count from the compute per example allowing for extremely large, but efficient models. The resulting models have demonstrated significant improvements across diverse domains such as natural language processing, computer vision, and speech recognition. We review the concept of sparse expert models, provide a basic description of the common algorithms, contextualize the advances in the deep learning era, and conclude by highlighting areas for future work.
Electrocardiogram (ECG) foundation models represent a paradigm shift from task-specific pipelines to generalizable architectures pre-trained on large-scale unlabeled waveform data. This survey presents a unified and deployment-aware review of foundation models and medical large language models (LLMs) for ECG intelligence in cardiovascular disease (CVD) diagnosis, monitoring, and clinical decision support. The central thesis of this survey paper is that next-generation cardiovascular AI systems will be inherently agentic, requiring the synergistic integration of two complementary model classes: (i) ECG foundation models that act as signal-level interpreters, learning rich electrophysiological representations via self-supervised and multimodal pretraining, and (ii) medical LLMs, trained on biomedical text corpora, that function as knowledge-based reasoning backbones for contextual inference, guideline alignment, and clinical decision support. Thus, the survey systematically reviews existing pool of generalist medical LLMs, as well as ECG foundation models that utilize techniques such as self-supervised learning, multimodal ECG-language alignment, vision transformer architectures, and
Mild cognitive impairment (MCI) leading to dementia results in a constellation of psychiatric disorders including depression, mood disorders, schizophrenia and others. With increasing age, mild cognitive impairment leads to increased disability-adjusted life-years and healthcare burden. A huge number of drug trials for the treatment of MCI associated with Alzheimer's disease have undergone failure leading to the development of drugs that could avert the progression of the disease. However, some novel non-drug-based therapies like ultrasound ablation of amyloid plaques have influenced researchers to explore the non-pharmacological modalities for the treatment of mild cognitive impairment. To compensate for neurodegenerative loss resulting in coexisting psychiatric disorders, neurofeedback therapy has also been proven to improve behavioural outcomes by inducing neuroplasticity. The aim of the current review is to highlight the pathophysiological aspects of mild cognitive impairment leading to dementia that could be addressed with no pharmacological interventions and to understand the mechanisms behind the effects of these interventions.
This paper presents a systematic literature review of published studies on AI-based automated speech therapy tools for persons with speech sound disorders (SSD). The COVID-19 pandemic has initiated the requirement for automated speech therapy tools for persons with SSD making speech therapy accessible and affordable. However, there are no guidelines for designing such automated tools and their required degree of automation compared to human experts. In this systematic review, we followed the PRISMA framework to address four research questions: 1) what types of SSD do AI-based automated speech therapy tools address, 2) what is the level of autonomy achieved by such tools, 3) what are the different modes of intervention, and 4) how effective are such tools in comparison with human experts. An extensive search was conducted on digital libraries to find research papers relevant to our study from 2007 to 2022. The results show that AI-based automated speech therapy tools for persons with SSD are increasingly gaining attention among researchers. Articulation disorders were the most frequently addressed SSD based on the reviewed papers. Further, our analysis shows that most researchers pr
In this paper we present architecture of a fuzzy expert system used for therapy of dyslalic children. With fuzzy approach we can create a better model for speech therapist decisions. A software interface was developed for validation of the system. The main objectives of this task are: personalized therapy (the therapy must be in according with child's problems level, context and possibilities), speech therapist assistant (the expert system offer some suggestion regarding what exercises are better for a specific moment and from a specific child), (self) teaching (when system's conclusion is different that speech therapist's conclusion the last one must have the knowledge base change possibility).
This is a protocol for a systematic review of the effects of music therapy on offenders. Based on randomised controlled trials, the review aims to assess the effectiveness of music therapy on adolescent and adult offenders in custodial institutions including forensic psychiatric hospitals, and offenders or probationers in the community. The outcomes to be evaluated include alleviated symptoms of mental illness, psychosocial competencies and reduced recidivism.
Expert domain writing, such as scientific writing, typically demands extensive domain knowledge. Although large language models (LLMs) show promising potential in this task, evaluating the quality of automatically generated scientific writing is a crucial open issue, as it requires knowledge of domain-specific criteria and the ability to discern expert preferences. Conventional automatic evaluation metrics and LLM-as-a-judge systems, primarily designed for mainstream NLP tasks, are insufficient to grasp expert preferences and domain-specific quality standards. To address this gap and support realistic human-AI collaborative writing, we focus on related work generation, one of the most challenging scientific tasks, as an exemplar. We propose GREP, a multi-turn evaluation framework that integrates classical related work evaluation criteria with expert-specific preferences. GREP decomposes the evaluation into smaller fine-grained dimensions. This localized evaluation is further augmented with contrastive examples to provide detailed contextual guidance for the evaluation dimensions. Empirical investigation reveals that GREP is able to assess the quality of related work sections in a m
Purpose: Proton therapy provides superior dose conformity compared to photon therapy, but its treatment planning is challenged by sensitivity to anatomical changes, setup/range uncertainties, and computational complexity. This review evaluates the role of artificial intelligence (AI) in improving proton therapy treatment planning. Materials and methods: Recent studies on AI applications in image reconstruction, image registration, dose calculation, plan optimization, and quality assessment were reviewed and summarized by application domain and validation strategy. Results: AI has shown promise in automating contouring, enhancing imaging for dose calculation, predicting dose distributions, and accelerating robust optimization. These methods reduce manual workload, improve efficiency, and support more personalized planning and adaptive planning. Limitations include data scarcity, model generalizability, and clinical integration. Conclusion: AI is emerging as a key enabler of efficient, consistent, and patient-specific proton therapy treatment planning. Addressing challenges in validation and implementation will be essential for its translation into routine clinical practice.
Given the large number of publications in software engineering, frequent literature reviews are required to keep current on work in specific areas. One tedious work in literature reviews is to find relevant studies amongst thousands of non-relevant search results. In theory, expert systems can assist in finding relevant work but those systems have primarily been tested in simulations rather than in application to actual literature reviews. Hence, few researchers have faith in such expert systems. Accordingly, using a realistic case study, this paper assesses how well our state-of-the-art expert system can help with literature reviews. The assessed literature review aimed at identifying test case prioritization techniques for automated UI testing, specifically from 8,349 papers on IEEE Xplore. This corpus was studied with an expert system that incorporates an incrementally updated human-in-the-loop active learning tool. Using that expert system, in three hours, we found 242 relevant papers from which we identified 12 techniques representing the state-of-the-art in test case prioritization when source code information is not available. These results were then validated by six other g
This paper develops Virtual Speech Therapist (VST), an intelligent agent-based platform that streamlines stuttering assessment and delivers customized therapy planning through automated and adaptive AI-driven workflows. VST integrates state-of-the-art deep learning-based stuttering classification, and multi-agent large language model (LLM) reasoning to support evidence-based clinical decision-making. The VST begins with the acquisition and feature extraction of patient speech samples, followed by robust classification of stuttering types. Building on these outputs, VST initiates an agentic reasoning process in which specialized LLM agents autonomously generate, critique, and iteratively refine individualized therapy plans. A dedicated critic agent evaluates all generated therapy plans to ensure clinical safety, methodological soundness, and alignment with peer-reviewed evidence and established professional guidelines. The resulting output is a comprehensive, patient-specific therapy draft intended for clinician review. Incorporating clinician feedback, the system then produces a finalized therapy plan suitable for patient delivery, thereby maintaining a clinician-in-the-loop paradi
Introduction: While the origin and evolution of proteins remain mysterious, advances in evolutionary genomics and systems biology are facilitating the historical exploration of the structure, function and organization of proteins and proteomes. Molecular chronologies are series of time events describing the history of biological systems and subsystems and the rise of biological innovations. Together with time-varying networks, these chronologies provide a window into the past. Areas covered: Here, we review molecular chronologies and networks built with modern methods of phylogeny reconstruction. We discuss how chronologies of structural domain families uncover the explosive emergence of metabolism, the late rise of translation, the co-evolution of ribosomal proteins and rRNA, and the late development of the ribosomal exit tunnel; events that coincided with a tendency to shorten folding time. Evolving networks described the early emergence of domains and a late big bang of domain combinations. Expert opinion: Two processes, folding and recruitment appear central to the evolutionary progression. The former increases protein persistence. The later fosters diversity. Chronologically,
Recent advances in causal inference techniques, more specifically, in the theory of structural causal models, provide the framework for identification of causal effects from observational data in the cases where the causal graph is identifiable, i.e., the data generating mechanism can be recovered from the joint distribution. However, no such studies have been done to demonstrate this concept with a clinical example. We present a complete framework to estimate the causal effect from observational data by augmenting expert knowledge in the model development phase and with a practical clinical application. Our clinical application entails a timely and important research question, i.e., the effect of oxygen therapy intervention in the intensive care unit (ICU); the result of this project is useful in a variety of disease conditions, including severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) patients in the ICU. We used data from the MIMIC III database, a standard database in the machine learning community that contains 58,976 admissions from an ICU in Boston, MA, for estimating the oxygen therapy effect on morality. We also identified the covariate-specific effect to oxyge
This paper reviews the current progress in applying machine learning (ML) tools to solve NP-hard combinatorial optimization problems, with a focus on routing problems such as the traveling salesman problem (TSP) and the vehicle routing problem (VRP). Due to the inherent complexity of these problems, exact algorithms often require excessive computational time to find optimal solutions, while heuristics can only provide approximate solutions without guaranteeing optimality. With the recent success of machine learning models, there is a growing trend in proposing and implementing diverse ML techniques to enhance the resolution of these challenging routing problems. We propose a taxonomy categorizing ML-based routing methods into construction-based and improvement-based approaches, highlighting their applicability to various problem characteristics. This review aims to integrate traditional OR methods with state-of-the-art ML techniques, providing a structured framework to guide future research and address emerging VRP variants.