Acute poly-substance intoxication requires rapid, life-saving decisions under substantial uncertainty, as clinicians must rely on incomplete ingestion details and nonspecific symptoms. Effective diagnostic reasoning in this chaotic environment requires fusing unstructured, non-medical narratives (e.g. paramedic scene descriptions and unreliable patient self-reports or known histories), with structured medical data like vital signs. While Large Language Models (LLMs) show potential for processing such heterogeneous inputs, they struggle in this setting, often underperforming simple baselines that rely solely on patient histories. To address this, we present DeToxR (Decision-support for Toxicology with Reasoning), the first adaptation of Reinforcement Learning (RL) to emergency toxicology. We design a robust data-fusion engine for multi-label prediction across 14 substance classes based on an LLM finetuned with Group Relative Policy Optimization (GRPO). We optimize the model's reasoning directly using a clinical performance reward. By formulating a multi-label agreement metric as the reward signal, the model is explicitly penalized for missing co-ingested substances and hallucinating
This paper presents a quasi-sequential optimal design framework for toxicology experiments, specifically applied to sea urchin embryos. The authors propose a novel approach combining robust optimal design with adaptive, stage-based testing to improve efficiency in toxicological studies, particularly where traditional uniform designs fall short. The methodology uses statistical models to refine dose levels across experimental phases, aiming for increased precision while reducing costs and complexity. Key components include selecting an initial design, iterative dose optimization based on preliminary results, and assessing various model fits to ensure robust, data-driven adjustments. Through case studies, we demonstrate improved statistical efficiency and adaptability in toxicology, with potential applications in other experimental domains.
In toxicology research, experiments are often conducted to determine the effect of toxicant exposure on the behavior of mice, where mice are randomized to receive the toxicant or not. In particular, in fixed interval experiments, one provides a mouse reinforcers (e.g., a food pellet), contingent upon some action taken by the mouse (e.g., a press of a lever), but the reinforcers are only provided after fixed time intervals. Often, to analyze fixed interval experiments, one specifies and then estimates the conditional state-action distribution (e.g., using an ANOVA). This existing approach, which in the reinforcement learning framework would be called modeling the mouse's "behavioral policy," is sensitive to misspecification. It is likely that any model for the behavioral policy is misspecified; a mapping from a mouse's exposure to their actions can be highly complex. In this work, we avoid specifying the behavioral policy by instead learning the mouse's reward function. Specifying a reward function is as challenging as specifying a behavioral policy, but we propose a novel approach that incorporates knowledge of the optimal behavior, which is often known to the experimenter, to avoi
Traditional animal testing for toxicity is expensive, time consuming, ethically questioned, sometimes inaccurate because of the necessity to extrapolate from animal to human, and in most cases not formally validated according to modern standards. This is driving regulatory bodies and companies in backing alternative methods focusing on in silico and in vitro approaches. These are complex to implement and validate, and their wide adoption is not yet established despite legal directives providing an imperative. It is difficult to link a cell level response to effects on a whole organism, but the advances in high-throughput toxicogenomics towards elucidating the mechanism of action of substances are gradually reducing this gap and fostering the adoption of Next Generation Safety Assessment approaches. Quantitative in vitro to in vivo extrapolation (QIVIVE) methods hold the promise to reveal how to use in vitro -omics data to predict the potential for in vivo toxicity. They could improve lead compounds prioritisation, reduce time and costs, also in numbers of animal lives, and help with the complexity of extrapolating between species. We provide a description of QIVIVE state of the art
We provide a systematic treatment of $D$-optimal design for binary regression and quantal response models in toxicology studies. For the two-parameter case, we provide an analytical equation (WC equation) for computing the $D$-optimal design quickly and when analytical solution is not available, we apply particle swarm optimization to solve for the $D$-optimal design. Examples with various link functions are given as well as the sensitivity functions. We extend the two-parameter case to three-parameter case by providing a neat formula for the determinant of the information matrix. We also suggest practitioners to work with the neat formula to derive optimal designs for three-parameter binary regression models.
In-vivo toxicological studies are characterized by multiple primary endpoints with quite different scales. Whereas guidelines and publications provide various statistical tests for normally distributed endpoints (such as organ weights) and proportions (such as tumor rates), few approaches are available for graded histopathological findings, such as 0, +, ++, +++. This represents a basic contradiction of the statistical analysis because these graded findings sometimes show a high predictive value for potential toxic effects. Here we discuss different methods comparatively, especially from the viewpoints of i) designs for very small sample sizes and ii) interpretability by toxicologists. A new approach is recommended where a simultaneous test is performed over all class combinations of score levels, such as (0, +) vs (++, +++). Corresponding R code is provided by way of a data example.
To claim similarity of multiple dose-response curves in interlaboratory studies in regulatory toxicology is a relevant issue during the assay validation process. Here we demonstrated the use of dose-by-laboratory interaction contrasts, particularly Williams-type by total mean contrasts. With the CRAN packages statint and multcomp in the open-source software R the estimation of adjusted p-values or compatible simultaneous confidence intervals is relatively easy. The interpretation in terms of global or partial equivalence, i.e. similarity, is challenging, because thresholds are not available a-priori. This approach is demonstrated by selected in-vitro Ames MPF assay data.
The separate evaluation for males and females is the recent standard in in-vivo toxicology for dose or treatment effects using Dunnett tests. The alternative pre-test for sex-by-treatment interaction is problematic. Here a joint test is proposed considering the two sex-specific and the pooled Dunnett-type comparisons. The calculation of either simultaneous confidence intervals or adjusted p-values with the R-package multcomp is demonstrated using a real data example.
With the National Toxicology Program issuing its final report on cancer, rats and cell phone radiation, one can draw the following conclusions from their data. There is a roughly linear relationship between gliomas (brain cancers) and schwannomas (cancers of the nerve sheaths around the heart) with increased absorption of 900 MHz radiofrequency radiation for male rats. The rate of these cancers in female rats is about one third the rate in male rats; the rate of gliomas in female humans is about two thirds the rate in male humans. Both of these observations can be explained by a decrease in sensitivity to chemical carcinogenesis in both female rats and female humans. The increase in male rat life spans with increased radiofrequency absorption is due to a reduction in kidney failure from a decrease in food intake. No such similar increase in the life span of humans who use cell phones is expected.
We consider design issues for toxicology studies when we have a continuous response and the true mean response is only known to be a member of a class of nested models. This class of non-linear models was proposed by toxicologists who were concerned only with estimation problems. We develop robust and efficient designs for model discrimination and for estimating parameters in the selected model at the same time. In particular, we propose designs that maximize the minimum of $D$- or $D_1$-efficiencies over all models in the given class. We show that our optimal designs are efficient for determining an appropriate model from the postulated class, quite efficient for estimating model parameters in the identified model and also robust with respect to model misspecification. To facilitate the use of optimal design ideas in practice, we have also constructed a website that freely enables practitioners to generate a variety of optimal designs for a range of models and also enables them to evaluate the efficiency of any design.
As the volume and complexity of nonclinical toxicology studies continue to increase, toxicologic pathology reporting faces persistent challenges, including fragmented sources of data (e.g., histopathology images, clinical pathology and other study data, adverse effects database, mechanistic literature), variable reporting timelines and heightened regulatory expectations. This white paper examines the emerging role of agentic artificial intelligence (AI) in addressing these issues through coordinated workflow orchestration, data integration, and pathologist-in-the-loop report generation. Based on a closed-door roundtable held during the 2025 Society of Toxicologic Pathology (STP) Annual Meeting and follow-on discussions, this paper synthesizes the perspectives of leading toxicologic pathologists, toxicologists, and AI developers. It outlines the key pain points in current reporting workflows, identifies realistic near-term use cases for agentic AI, and describes major adoption barriers including requirements for transparency, validation, and organizational readiness. A phased adoption roadmap and pilot design considerations are proposed to help support responsible evaluation and dep
In pharmaceutical and toxicological research, historical control data are increasingly used to validate concurrent control groups, typically via the construction of historical control limits. While methods have been described for continuous and dichotomous endpoints, approaches for overdispersed multinomial data, common in developmental and reproductive toxicology or histopathology, are currently lacking. This article introduces and compares methods for constructing simultaneous prediction intervals for future multinomial observations subject to overdispersion. We investigate a range of frequentist approaches, including asymptotic approximations and bootstrap techniques (incorporating symmetric, asymmetric, and marginal calibration, as well as rank-based methods), alongside Bayesian hierarchical models. Extensive simulation studies assessing simultaneous coverage probability and the balance of lower and upper tail error probabilities show that standard asymptotic methods and simple Bonferroni adjustments yield liberal intervals, especially for small sample sizes or rare event categories. In contrast, bootstrap methods, specifically the Marginal Calibration and Rank-Based Simultaneo
In pre- and non-clinical toxicology, the reduction of animal use is highly desireable. Although approaches for possible sample size reduction in the concurrent control group were suggested previously under the virtual control groups framework for continuous endpoints, methodology that is applicable to binary outcomes that occur in long-term carcinogenicity studies is currently missing. In order to augment animals in the current control group with historical control data, we propose approaches that rely on dynamic Bayesian borrowing and simultaneous credible intervals for risk ratios. Several operation characteristics such as familywise error rate (FWER) and power are assessed via Monte-Carlo simulations and compared to the ones of approaches that rely on pooling of historical and current observations. It turned out that under optimal conditions, Bayesian approaches based on robustified prior distributions enable a substantial reduction of the control groups sample size, while still controlling the FWER up to a satisfactory level. Furthermore, at least to some extend, these approaches were able to protect against possible drift. This hightlights the potential of Bayesian study desig
The task here is to predict the toxicological activity of chemical compounds based on the Tox21 dataset, a benchmark in computational toxicology. After a domain-specific overview of chemical toxicity, we discuss current computational strategies, focusing on machine learning and deep learning. Several architectures are compared in terms of performance, robustness, and interpretability. This research introduces a novel image-based pipeline based on DenseNet121, which processes 2D graphical representations of chemical structures. Additionally, we employ Grad-CAM visualizations, an explainable AI technique, to interpret the model's predictions and highlight molecular regions contributing to toxicity classification. The proposed architecture achieves competitive results compared to traditional models, demonstrating the potential of deep convolutional networks in cheminformatics. Our findings emphasize the value of combining image-based representations with explainable AI methods to improve both predictive accuracy and model transparency in toxicology.
Drug-induced toxicity remains a leading cause of failure in preclinical development and early clinical trials. Detecting adverse effects at an early stage is critical to reduce attrition and accelerate the development of safe medicines. Histopathological evaluation remains the gold standard for toxicity assessment, but it relies heavily on expert pathologists, creating a bottleneck for large-scale screening. To address this challenge, we introduce an AI-based anomaly detection framework for histopathological whole-slide images (WSIs) in rodent livers from toxicology studies. The system identifies healthy tissue and known pathologies (anomalies) for which training data is available. In addition, it can detect rare pathologies without training data as out-of-distribution (OOD) findings. We generate a novel dataset of pixelwise annotations of healthy tissue and known pathologies and use this data to fine-tune a pre-trained Vision Transformer (DINOv2) via Low-Rank Adaptation (LoRA) in order to do tissue segmentation. Finally, we extract features for OOD detection using the Mahalanobis distance. To better account for class-dependent variability in histological data, we propose the use o
Using LLMs in healthcare, Computer-Supported Cooperative Work, and Social Computing requires the examination of ethical and social norms to ensure safe incorporation into human life. We conducted a mixed-method study, including an online survey with 111 participants and an interview study with 38 experts, to investigate the AI ethics and social norms in ChatGPT as everyday life tools. This study aims to evaluate whether ChatGPT in an empirical context operates following ethics and social norms, which is critical for understanding actions in industrial and academic research and achieving machine ethics. The findings of this study provide initial insights into six important aspects of AI ethics, including bias, trustworthiness, security, toxicology, social norms, and ethical data. Significant obstacles related to transparency and bias in unsupervised data collection methods are identified as ChatGPT's ethical concerns.
Class imbalance is a pervasive problem in predictive toxicology, where the number of non-toxic compounds often exceeds the number of toxic ones. Models trained on such data often perform well on the majority class but poorly on the minority class, which is most relevant for safety assessment. We propose a simple and general Bayesian framework that addresses class imbalance by modifying the likelihood function. Each observation's likelihood is raised to a power inversely proportional to its class proportion, with the weights normalized to preserve the overall information content. This weighted-likelihood (or power-likelihood) approach embeds cost-sensitive learning directly into Bayesian updating. The method is demonstrated using simulated binary data and an ordered logistic model for drug-induced liver injury (DILI). Weighting alters parameter estimates and decision boundaries, improving balanced accuracy and sensitivity for the minority (toxic) class. The approach can be implemented with minimal changes in standard probabilistic programming languages such as Stan, PyMC, and Turing.jl. This framework provides an easily extensible foundation for developing Bayesian prediction models
Geometric deep learning is an emerging technique in Artificial Intelligence (AI) driven cheminformatics, however the unique implications of different Graph Neural Network (GNN) architectures are poorly explored, for this space. This study compared performances of Graph Convolutional Networks (GCNs), Graph Attention Networks (GATs) and Graph Isomorphism Networks (GINs), applied to 7 different toxicological assay datasets of varying data abundance and endpoint, to perform binary classification of assay activation. Following pre-processing of molecular graphs, enforcement of class-balance and stratification of all datasets across 5 folds, Bayesian optimisations were carried out, for each GNN applied to each assay dataset (resulting in 21 unique Bayesian optimisations). Optimised GNNs performed at Area Under the Curve (AUC) scores ranging from 0.728-0.849 (averaged across all folds), naturally varying between specific assays and GNNs. GINs were found to consistently outperform GCNs and GATs, for the top 5 of 7 most data-abundant toxicological assays. GATs however significantly outperformed over the remaining 2 most data-scarce assays. This indicates that GINs are a more optimal archite
Mathematical modeling in systems toxicology enables a comprehensive understanding of the effects of pharmaceutical substances on cardiac health. However, the complexity of these models limits their widespread application in early drug discovery. In this paper, we introduce a novel approach to solving parameterized models of cardiac action potentials by combining meta-learning techniques with Systems Biology-Informed Neural Networks (SBINNs). The proposed method, hyperSBINN, effectively addresses the challenge of predicting the effects of various compounds at different concentrations on cardiac action potentials, outperforming traditional differential equation solvers in speed. Our model efficiently handles scenarios with limited data and complex parameterized differential equations. The hyperSBINN model demonstrates robust performance in predicting APD90 values, indicating its potential as a reliable tool for modeling cardiac electrophysiology and aiding in preclinical drug development. This framework represents an advancement in computational modeling, offering a scalable and efficient solution for simulating and understanding complex biological systems.
In the field of environmental toxicology, rapid and precise assessment of the inflammatory response to pollutants in biological models is critical. This study leverages the power of deep learning to enable automated assessments of zebrafish, a model organism widely used for its translational relevance to human disease pathways. We present an innovative approach to assessing inflammatory responses in zebrafish exposed to various pollutants through an end-to-end deep learning model. The model employs a Unet-based architecture to automatically process high-throughput lateral zebrafish images, segmenting specific regions and quantifying neutrophils as inflammation markers. Alongside imaging, qPCR analysis offers gene expression insights, revealing the molecular impact of exposure on inflammatory pathways. Moreover, the deep learning model was packaged as a user-friendly executable file (.exe), facilitating widespread application by enabling use on virtually any computer without the need for specialized software or training.