Clinical trials are essential to drug development but time-consuming, costly, and prone to failure. Accurate trial outcome prediction based on historical trial data promises better trial investment decisions and more trial success. Existing trial outcome prediction models were not designed to model the relations among similar trials, capture the progression of features and designs of similar trials, or address the skewness of trial data which causes inferior performance for less common trials. To fill the gap and provide accurate trial outcome prediction, we propose Sequential Predictive mOdeling of clinical Trial outcome (SPOT) that first identifies trial topics to cluster the multi-sourced trial data into relevant trial topics. It then generates trial embeddings and organizes them by topic and time to create clinical trial sequences. With the consideration of each trial sequence as a task, it uses a meta-learning strategy to achieve a point where the model can rapidly adapt to new tasks with minimal updates. In particular, the topic discovery module enables a deeper understanding of the underlying structure of the data, while sequential learning captures the evolution of trial de
Clinical trials are fundamental in developing new drugs, medical devices, and treatments. However, they are often time-consuming and have low success rates. Although there have been initial attempts to create large language models (LLMs) for clinical trial design and patient-trial matching, these models remain task-specific and not adaptable to diverse clinical trial tasks. To address this challenge, we propose a clinical trial foundation model named Panacea, designed to handle multiple tasks, including trial search, trial summarization, trial design, and patient-trial matching. We also assemble a large-scale dataset, named TrialAlign, of 793,279 trial documents and 1,113,207 trial-related scientific papers, to infuse clinical knowledge into the model by pre-training. We further curate TrialInstruct, which has 200,866 of instruction data for fine-tuning. These resources enable Panacea to be widely applicable for a range of clinical trial tasks based on user requirements. We evaluated Panacea on a new benchmark, named TrialPanorama, which covers eight clinical trial tasks. Our method performed the best on seven of the eight tasks compared to six cutting-edge generic or medicine-spec
Clinical trials are vital for evaluation of safety and efficacy of new treatments. However, clinical trials are resource-intensive, time-consuming and expensive to conduct, where errors in trial design, reduced efficacy, and safety events can result in significant delays, financial losses, and damage to reputation. These risks underline the importance of informed and strategic decisions in trial design to mitigate these risks and improve the chances of a successful trial. Identifying similar historical trials is critical as these trials can provide an important reference for potential pitfalls and challenges including serious adverse events, dosage inaccuracies, recruitment difficulties, patient adherence issues, etc. Addressing these challenges in trial design can lead to development of more effective study protocols with optimized patient safety and trial efficiency. In this paper, we present a novel method to identify similar historical trials by summarizing clinical trial protocols and searching for similar trials based on a query trial's protocol. Our approach significantly outperforms all baselines, achieving up to a 78% improvement in recall@1 and a 53% improvement in precis
Clinical trials constitute a critical yet exceptionally challenging and costly stage of drug development (\$2.6B per drug), where protocols are encoded as complex natural language documents, motivating the use of AI systems beyond manual analysis. Existing AI methods accurately predict trial failure, but do not provide actionable remedies. To fill this gap, this paper proposes ClinicalReTrial, a multi-agent system that formulates clinical trial optimization as an iterative redesign problem on textural protocols. Our method integrates failure diagnosis, safety-aware modifications, and candidate evaluation in a closed-loop, reward-driven optimization framework. Serving the outcome prediction model as a simulation environment, ClinicalReTrial enables low-cost evaluation and dense reward signals for continuous self-improvement. We further propose a hierarchical memory that captures iteration-level feedback within trials and distills transferable redesign patterns across trials. Empirically, ClinicalReTrial improves $83.3\%$ of trial protocols with a mean success probability gain of $5.7\%$ with negligible cost (\$0.12 per trial). Retrospective case studies demonstrate alignment between
Target trial emulation has improved comparative effectiveness research by making the causal question, assumptions, and analysis plan explicit. However, target trial protocols are usually developed iteratively. After examining the data, investigators revise the protocol to reflect which target trials the observational data can realistically support. While this iterative procedure is part of normal scientific practice, it raises concerns about selective choices and invalid statistical inference. A simple procedure can address these concerns. This procedure is based on sample splitting. In the initial split, investigators explore the data to define a target trial protocol. When these choices are made, the target trial protocol is implemented on the second split. Although the investigators made data-informed choices to select the target trial protocol, the inference has the usual coverage guarantees. The procedure is created to mirror how trialists move from pilot studies to a phase 3 trial. First, they use data from pilots and early-phase trials to learn and decide on a final protocol. Then they implement this protocol and analyze a new set of data in a phase 3 trial.
Adaptive sample size re-estimation, early stopping, and trial re-design at interim analyses can reduce expected sample sizes in randomised trials. Cluster randomised trials, in which groups of participants are randomly allocated to treatment status, may particularly benefit as they can be costly and their required sample sizes depend on one or more auxiliary parameters governing correlations within and between clusters, which are often estimated with high uncertainty. We adapt a combination test approach to the cluster trial setting allowing for early stopping for futility or efficacy and accounting for correlations between trial stages and other nuisance parameters. We consider design decisions for multi-dimensional sample sizes involving clusters, participants, and time and allowing for modifications to intervention roll-out patterns. We use a Pareto optimality approach to balance objectives relating to different components of the sample size and costs. We also examine the interim estimation of auxiliary parameters and trial re-design for efficiency. We illustrate the methods including examples of stepped-wedge trial re-design and a re-analysis of the large cluster randomised tri
Background The cost of drug discovery and development is substantial, with clinical trial outcomes playing a critical role in regulatory approval and patient care. However, access to large-scale, high-quality clinical trial outcome data remains limited, hindering advancements in predictive modeling and evidence-based decision-making. Methods We present the Clinical Trial Outcome (CTO) benchmark, a fully reproducible, large-scale repository encompassing approximately 125,000 drug and biologics trials. CTO integrates large language model (LLM) interpretations of publications, trial phase progression tracking, sentiment analysis from news sources, stock price movements of trial sponsors, and additional trial-related metrics. Furthermore, we manually annotated a dataset of clinical trials conducted between 2020 and 2024 to enhance the quality and reliability of outcome labels. Results The trial outcome labels in the CTO benchmark agree strongly with expert annotations, achieving an F1 score of 94 for Phase 3 trials and 91 across all phases. Additionally, benchmarking standard machine learning models on our manually annotated dataset revealed distribution shifts in recent trials, unders
Trial engagement effects are effects of trial participation on the outcome that are not mediated by treatment assignment. Most work on extending (generalizing or transporting) causal inferences from a randomized trial to a target population has, explicitly or implicitly, assumed that trial engagement effects are absent, allowing evidence about the effects of the treatments examined in trials to be applied to non-experimental settings. Here, we define novel causal estimands and present identification results for generalizability and transportability analyses in the presence of trial engagement effects. Our approach allows for trial engagement effects under assumptions of no causal interaction between trial participation and treatment assignment on the absolute or relative scales. We show that under these assumptions, even in the presence of trial engagement effects, the trial data can be combined with covariate data from the target population to identify average treatment effects in the context of usual care as implemented in the target population (i.e., outside the experimental setting). The identifying observed data functionals under these no-interaction assumptions are the same a
Clinical trials are essential for drug development but are extremely expensive and time-consuming to conduct. It is beneficial to study similar historical trials when designing a clinical trial. However, lengthy trial documents and lack of labeled data make trial similarity search difficult. We propose a zero-shot clinical trial retrieval method, Trial2Vec, which learns through self-supervision without annotating similar clinical trials. Specifically, the meta-structure of trial documents (e.g., title, eligibility criteria, target disease) along with clinical knowledge (e.g., UMLS knowledge base https://www.nlm.nih.gov/research/umls/index.html) are leveraged to automatically generate contrastive samples. Besides, Trial2Vec encodes trial documents considering meta-structure thus producing compact embeddings aggregating multi-aspect information from the whole document. We show that our method yields medically interpretable embeddings by visualization and it gets a 15% average improvement over the best baselines on precision/recall for trial retrieval, which is evaluated on our labeled 1600 trial pairs. In addition, we prove the pre-trained embeddings benefit the downstream trial outc
The clinical trial process, a critical phase in drug development, is essential for developing new treatments. The primary goal of interventional clinical trials is to evaluate the safety and efficacy of drug-based treatments for specific diseases. However, these trials are often lengthy, labor-intensive, and expensive. The duration of a clinical trial significantly impacts overall costs, making efficient timeline management crucial for controlling budgets and ensuring the economic feasibility of research. To address this issue, We propose TrialDura, a machine learning-based method that estimates the duration of clinical trials using multimodal data, including disease names, drug molecules, trial phases, and eligibility criteria. Then, we encode them into Bio-BERT embeddings specifically tuned for biomedical contexts to provide a deeper and more relevant semantic understanding of clinical trial data. Finally, the model's hierarchical attention mechanism connects all of the embeddings to capture their interactions and predict clinical trial duration. Our proposed model demonstrated superior performance with a mean absolute error (MAE) of 1.04 years and a root mean square error (RMSE)
The "free trial" followed by automatic renewal is a dominant business model in the digital economy. Standard models explain trials as a mechanism for consumers to learn their valuation for a product. We propose a complementary theory based on the rational inattention framework. Consumers know their valuation but face a cognitive cost to remember to cancel an unwanted subscription. We model this using a Shannon entropy-based cost of information processing, where a consumer's baseline attention level decays with the length of the trial period. This creates a novel trade-off for a monopolist firm: a longer trial increases "inattentive revenue" from consumers who fail to cancel, but it also lowers ex-ante consumer utility, making the initial offer less attractive. We show that this trade-off leads to an interior optimal trial length, even for products where value-learning is instantaneous. Our model, under standard assumptions about demand elasticity and the distribution of consumer valuations, generates sharp, testable predictions about the relationship between contract terms. We find that the optimal renewal price and trial length are complements: firms offering longer trials will al
Clinical trials are pivotal for developing new medical treatments but typically carry risks such as patient mortality and enrollment failure that waste immense efforts spanning over a decade. Applying artificial intelligence (AI) to predict key events in clinical trials holds great potential for providing insights to guide trial designs. However, complex data collection and question definition requiring medical expertise have hindered the involvement of AI thus far. This paper tackles these challenges by presenting a comprehensive suite of 23 meticulously curated AI-ready datasets covering multi-modal input features and 8 crucial prediction challenges in clinical trial design, encompassing prediction of trial duration, patient dropout rate, serious adverse event, mortality rate, trial approval outcome, trial failure reason, drug dose finding, design of eligibility criteria. Furthermore, we provide basic validation methods for each task to ensure the datasets' usability and reliability. We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design, ultimately advancing clinical trial research and ac
Analyzing data from past clinical trials is part of the ongoing effort to optimize the design, implementation, and execution of new clinical trials and more efficiently bring life-saving interventions to market. While there have been recent advances in the generation of static context synthetic clinical trial data, due to both limited patient availability and constraints imposed by patient privacy needs, the generation of fine-grained synthetic time-sequential clinical trial data has been challenging. Given that patient trajectories over an entire clinical trial are of high importance for optimizing trial design and efforts to prevent harmful adverse events, there is a significant need for the generation of high-fidelity time-sequence clinical trial data. Here we introduce TrialSynth, a Variational Autoencoder (VAE) designed to address the specific challenges of generating synthetic time-sequence clinical trial data. Distinct from related clinical data VAE methods, the core of our method leverages Hawkes Processes (HP), which are particularly well-suited for modeling event-type and time gap prediction needed to capture the structure of sequential clinical trial data. Our experiment
We discuss generalizability analyses under a partially nested trial design, where part of the trial is nested within a cohort of trial-eligible individuals, while the rest of the trial is not nested. This design arises, for example, when only some centers participating in a trial are able to collect data on non-randomized individuals, or when data on non-randomized individuals cannot be collected for the full duration of the trial. Our work is motivated by the Necrotizing Enterocolitis Surgery Trial (NEST) that compared initial laparotomy versus peritoneal drain for infants with necrotizing enterocolitis or spontaneous intestinal perforation. During the first phase of the study, data were collected from randomized individuals as well as consenting non-randomized individuals; during the second phase of the study, however, data were only collected from randomized individuals, resulting in a partially nested trial design. We propose methods for generalizability analyses with partially nested trial designs. We describe identification conditions and propose estimators for causal estimands in the target population of all trial-eligible individuals, both randomized and non-randomized, in
Clinical imaging trials play a crucial role in advancing medical innovation but are often costly, inefficient, and ethically constrained. Virtual Imaging Trials (VITs) present a solution by simulating clinical trial components in a controlled, risk-free environment. The Virtual Lung Screening Trial (VLST), an in silico study inspired by the National Lung Screening Trial (NLST), illustrates the potential of VITs to expedite clinical trials, minimize risks to participants, and promote optimal use of imaging technologies in healthcare. This study aimed to show that a virtual imaging trial platform could investigate some key elements of a major clinical trial, specifically the NLST, which compared Computed tomography (CT) and chest radiography (CXR) for lung cancer screening. With simulated cancerous lung nodules, a virtual patient cohort of 294 subjects was created using XCAT human models. Each virtual patient underwent both CT and CXR imaging, with deep learning models, the AI CT-Reader and AI CXR-Reader, acting as virtual readers to perform recall patients with suspicion of lung cancer. The primary outcome was the difference in diagnostic performance between CT and CXR, measured by
The target trial framework enables causal inference from longitudinal observational data by emulating randomized trials initiated at multiple time points. Precision is often improved by pooling information across trials, with standard models typically assuming - among other things - a time-constant treatment effect. However, this obscures interpretation when the true treatment effect varies, which we argue to be likely as a result of relying on noncollapsible estimands. To address these challenges, this paper introduces a model-free strategy for target trial analysis, centered around the choice of the estimand, rather than model specification. This ensures that treatment effects remain clearly interpretable for well-defined populations even under model misspecification. We propose estimands suitable for different study designs, and develop accompanying G-computation and inverse probability weighted estimators. Applications on simulations and real data on antimicrobial de-escalation in an intensive care unit setting demonstrate the greater clarity and reliability of the proposed methodology over traditional techniques.
Recent advances in LLMs have greatly improved general-domain NLP tasks. Yet, their adoption in critical domains, such as clinical trial recruitment, remains limited. As trials are designed in natural language and patient data is represented as both structured and unstructured text, the task of matching trials and patients benefits from knowledge aggregation and reasoning abilities of LLMs. Classical approaches are trial-specific and LLMs with their ability to consolidate distributed knowledge hold the potential to build a more general solution. Yet recent applications of LLM-assisted methods rely on proprietary models and weak evaluation benchmarks. In this survey, we are the first to analyze the task of trial-patient matching and contextualize emerging LLM-based approaches in clinical trial recruitment. We critically examine existing benchmarks, approaches and evaluation frameworks, the challenges to adopting LLM technologies in clinical research and exciting future directions.
The Learn-As-you-GO (LAGO) design provides a rigorous framework for adapting the intervention package based on accumulating data while the trial is ongoing. This article improves the flexibility of the LAGO design by incorporating statistical power as an optimization criterion (power goal) in LAGO optimizations. We propose the unconditional and conditional power approaches to add a power goal. Both approaches estimate the power at the end of the LAGO trial using data from prior stages, and increase the power at the end of the LAGO trial when the original trial was underpowered. Including a power goal maintains the asymptotic properties of the estimators of the treatment effect while preserving the asymptotic level of the statistical test at the end of the trial. We illustrate the benefits of our methods through a retrospective application to the BetterBirth Study, a large-scale study of maternal-newborn care that failed to show a significant effect on its primary outcome. This analysis demonstrates how our methods could have led to more intensive interventions and potentially significant results. The LAGO design with power goal optimizations provides investigators with a powerful t
Platform trials evaluate multiple experimental treatments against a common control group (and/or against each other), which often reduces the trial duration and sample size. Bayesian platform designs offer several practical advantages, including the flexible addition or removal of experimental arms using posterior probabilities and the incorporation of prior/external information. Regulatory agencies require that the operating characteristics of Bayesian designs are assessed by estimating the sampling distribution of posterior probabilities via Monte Carlo simulation. It is computationally intensive to repeat this simulation process for all design configurations considered, particularly for platform trials with complex interim decision procedures. In this paper, we propose an efficient method to assess operating characteristics and determine sample sizes as well as other design parameters for Bayesian platform trials. We prove theoretical results that allow us to model the joint sampling distribution of posterior probabilities across multiple endpoints and trial stages using simulations conducted at only two sample sizes. This work is motivated by design complexities in the SSTARLET
Causal inference is the goal of randomized trials and many observational studies. The first step in a formal causal inference framework is to define the causal estimand, and in both types of study this can be intuitively defined as the effect in an ideal trial: a hypothetical perfect randomized experiment (with representative sample, perfect adherence, etc.). The target trial framework is increasingly used for causal inference in observational studies, but clarity is lacking in how a target trial should be specified and how it relates to an ideal trial. In this paper, we review the concept of the ideal trial and highlight the need to balance relevance for decision-making in the real world and feasibility of estimation when specifying it. We then consider the question of how a target trial should be specified, outlining the challenges of a recommended approach, commonly seen in applications, that puts the focus heavily on feasibility of estimation: to specify the target trial such that it is closely aligned with the observational data (e.g. uses the same eligibility criteria). We argue that with this "aligned" approach, biases may remain relative to the estimand of ultimate practica