In the era of big data, secondary outcomes have become increasingly important alongside primary outcomes. These secondary outcomes, which can be derived from traditional endpoints in clinical trials, compound measures, or risk prediction scores, hold the potential to enhance the analysis of primary outcomes. Our method is motivated by the challenge of utilizing multiple secondary outcomes, such as blood biochemistry markers and urine assays, to improve the analysis of the primary outcome related to liver health. Current integration methods often fall short, as they impose strong model assumptions or require prior knowledge to construct over-identified working functions. This paper addresses these statistical challenges and potentially opens a new avenue in data integration by introducing a novel integrative learning framework that is applicable in a general setting. The proposed framework allows for the robust, data-driven integration of information from multiple secondary outcomes, promotes the development of efficient learning algorithms, and ensures optimal use of available data. Extensive simulation studies demonstrate that the proposed method significantly reduces variance in
This paper provides a nonparametric framework for causal inference with categorical outcomes under binary treatment and binary instrument settings. I decompose the observed joint probability of outcomes and treatment into marginal probabilities of potential outcomes and treatment, and association parameters that capture selection bias due to unobserved heterogeneity. Under a novel identifying assumption \emph{association similarity}, which requires the dependence between unobserved factors driving treatment and potential outcomes to be invariant across treatment states, I achieve point identification of the full distribution of potential outcomes. Recognizing that this assumption may be strong in some contexts, I propose two weaker alternatives: monotonic association, which restricts the direction of selection heterogeneity, and bounded association, which constrains its magnitude. These relaxed assumptions deliver sharp partial identification bounds that nest point identification as a special case and facilitate transparent sensitivity analysis. I illustrate the framework in an empirical application, estimating the causal effect of private health insurance on health outcomes.
The probability of necessity (PN), which quantifies the probability that an observed event would not have occurred in the absence of the treatment, is a central estimand in attribution analysis. While PN has been extensively studied for binary outcomes and has recently been developed for ordinal outcomes, a formal framework for continuous outcomes remains underdeveloped. To address this gap, we propose the general probability of necessity (GPN) for continuous outcomes, a setting that is substantially more challenging than the binary and ordinal cases. Rather than imposing strong identifiability assumptions, we adopt a partial identification perspective and derive sharp lower and upper bounds under standard assumptions of ignorability and monotonicity. We further introduce a copula-based framework that exploits dependence information between potential outcomes to tighten these bounds. Simulation studies and real-world applications demonstrate the effectiveness of our method.
With multiple outcomes in empirical research, a common strategy is to define a composite outcome as a weighted average of the original outcomes. However, the choices of weights are often subjective and can be controversial. We propose an inverse regression strategy for causal inference with multiple outcomes. The key idea is to regress the treatment on the outcomes, which is the inverse of the standard regression of the outcomes on the treatment. Although this strategy is simple and even counterintuitive, it has several advantages. First, testing for zero coefficients of the outcomes is equivalent to testing for the null hypothesis of zero effects, even though the inverse regression is deemed misspecified. Second, the coefficients of the outcomes provide a data-driven choice of the weights for defining a composite outcome. We also discuss the associated inference issues. Third, this strategy is applicable to general study designs. We illustrate the theory in both randomized experiments and observational studies.
We introduce Lehmer parking functions and study their set of parking outcomes. Our main results establish that the number of outcomes of Lehmer parking functions of length $n$ is given by a Bell number, which is exactly the number of set partitions of an $n$ element set. We also show that the number of outcomes of weakly decreasing Lehmer parking functions is given by a Catalan number, which corresponds to a subset of set partitions on a set with $n$ elements referred to as non-intersecting set partitions.
Predicting potential and counterfactual outcomes from observational data is central to individualized decision-making, particularly in clinical settings where treatment choices must be tailored to each patient rather than guided solely by population averages. We propose PO-Flow, a continuous normalizing flow (CNF) framework for causal inference that jointly models potential outcome distributions and factual-conditioned counterfactual outcomes. Trained via flow matching, PO-Flow provides a unified approach to individualized potential outcome prediction, conditional average treatment effect estimation, and counterfactual prediction. By encoding an observed factual outcome and decoding under an alternative treatment, PO-Flow provides an encode-decode mechanism for factual-conditioned counterfactual prediction. In addition, PO-Flow supports likelihood-based evaluation of potential outcomes, enabling uncertainty-aware assessment of predictions. A supporting recovery guarantee is established under certain assumptions, and empirical results on benchmark datasets demonstrate strong performance across a range of causal inference tasks within the potential outcomes framework.
Scholars of social stratification often study exposures that shape life outcomes. But some outcomes (such as wage) only exist for some people (such as those who are employed). We show how a common practice -- dropping cases with non-existent outcomes -- can obscure causal effects when a treatment affects both outcome existence and outcome values. The effects of both beneficial and harmful treatments can be underestimated. Drawing on existing approaches for principal stratification, we show how to study (1) the average effect on whether an outcome exists and (2) the average effect on the outcome among the latent subgroup whose outcome would exist in either treatment condition. To extend our approach to the selection-on-observables settings common in applied research, we develop a framework involving regression and simulation to enable principal stratification estimates that adjust for measured confounders. We illustrate through an empirical example about the effects of parenthood on labor market outcomes.
Probabilities of causation provide explanatory information on the observed occurrence (causal necessity) and non-occurrence (causal sufficiency) of events. Here, we adapt these probabilities (probability of necessity, probability of sufficiency, and probability of necessity and sufficiency) to an important class of epidemiologic outcomes, post-infection outcomes. A defining feature of studies on these outcomes is that they account for the post-treatment variable, infection acquisition, which means that, for individuals who remain uninfected, the outcome is not defined. Following previous work by Hudgens and Halloran, we describe analyses of post-infection outcomes using the principal stratification framework, and then derive expressions for the probabilities of causation in terms of principal strata-related parameters. Finally, we show that these expressions provide insights into the contributions of different processes (absence or occurrence of infection, and disease severity), implicitly encoded in the definition of the outcome, to causation.
Regression problems with bounded continuous outcomes frequently arise in real-world statistical and machine learning applications, such as the analysis of rates and proportions. A central challenge in this setting is predicting a response associated with a new covariate value. Most of the existing statistical and machine learning literature has focused either on point prediction of bounded outcomes or on interval prediction based on asymptotic approximations. We develop conformal prediction intervals for bounded outcomes based on transformation models and beta regression. We introduce tailored non-conformity measures based on residuals that are aligned with the underlying models, and account for the inherent heteroscedasticity in regression settings with bounded outcomes. We present a theoretical result on asymptotic marginal and conditional validity in the context of full conformal prediction, which remains valid under model misspecification. For split conformal prediction, we provide an empirical coverage analysis based on a comprehensive simulation study. The simulation study demonstrates that both methods provide valid finite-sample predictive coverage, including settings with
We show that the quadratic measure need not be postulated, but follows from the compatibility of two structural features of physical processes: linear reversible evolution prior to the formation of persistent records, and multiplicative composition of outcome weights once such records are established. Reversible evolution combines configurations additively at the level of a compatibility parameter, while the formation of persistent records induces a multiplicative structure on the weights assigned to physically realized outcomes. Requiring consistency between these two regimes constrains the admissible weight assignment to be quadratic in the associated amplitude. The Born rule therefore emerges as the unique measure compatible with reversible linear evolution and irreversible record formation, without assuming a probabilistic interpretation or a specific quantum formalism.
We develop a model of algorithmic pricing that shuts down every channel for explicit or implicit collusion while still generating collusive outcomes. We analyze the dynamics of a duopoly market where both firms use pricing algorithms consisting of a parameterized family of model specifications. The firms update both the parameters and the weights on models to adapt endogenously to market outcomes. We show that the market experiences recurrent episodes where both firms set prices at collusive levels. We analytically characterize the dynamics of the model, using large deviation theory to explain the recurrent episodes of collusive outcomes. Our results show that collusive outcomes may be a recurrent feature of algorithmic environments with complementarities and endogenous adaptation, providing a challenge for competition policy.
An individualized treatment rule (ITR) is a decision rule that recommends treatments for patients based on their individual feature variables. In many practices, the ideal ITR for the primary outcome is also expected to cause minimal harm to other secondary outcomes. Therefore, our objective is to learn an ITR that not only maximizes the value function for the primary outcome, but also approximates the optimal rule for the secondary outcomes as closely as possible. To achieve this goal, we introduce a fusion penalty to encourage the ITRs based on different outcomes to yield similar recommendations. Two algorithms are proposed to estimate the ITR using surrogate loss functions. We prove that the agreement rate between the estimated ITR of the primary outcome and the optimal ITRs of the secondary outcomes converges to the true agreement rate faster than if the secondary outcomes are not taken into consideration. Furthermore, we derive the non-asymptotic properties of the value function and misclassification rate for the proposed method. Finally, simulation studies and a real data example are used to demonstrate the finite-sample performance of the proposed method.
Assessing the causal effects of interventions on ordinal outcomes is an important objective of many educational and behavioral studies. Under the potential outcomes framework, we can define causal effects as comparisons between the potential outcomes under treatment and control. However, unfortunately, the average causal effect, often the parameter of interest, is difficult to interpret for ordinal outcomes. To address this challenge, we propose to use two causal parameters, which are defined as the probabilities that the treatment is beneficial and strictly beneficial for the experimental units. However, although well-defined for any outcomes and of particular interest for ordinal outcomes, the two aforementioned parameters depend on the association between the potential outcomes, and are therefore not identifiable from the observed data without additional assumptions. Echoing recent advances in the econometrics and biostatistics literature, we present the sharp bounds of the aforementioned causal parameters for ordinal outcomes, under fixed marginal distributions of the potential outcomes. Because the causal estimands and their corresponding sharp bounds are based on the potentia
Background. Dengue outbreaks are a major public health issue, with Brazil reporting 71% of global cases in 2024. Purpose. This study aims to describe the profile of severe dengue patients admitted to Brazilian Intensive Care units (ICUs) (2012-2024), assess trends over time, describe new onset complications while in ICU and determine the risk factors at admission to develop complications during ICU stay. Methods. We performed a prospective study of dengue patients from 253 ICUs across 56 hospitals. We used descriptive statistics to describe the dengue ICU population, logistic regression to identify risk factors for complications during the ICU stay, and a machine learning framework to predict the risk of evolving to complications. Visualisations were generated using ISARIC VERTEX. Results. Of 11,047 admissions, 1,117 admissions (10.1%) evolved to complications, including non-invasive (437 admissions) and invasive ventilation (166), vasopressor (364), blood transfusion (353) and renal replacement therapy (103). Age>80 (OR: 3.10, 95% CI: 2.02-4.92), chronic kidney disease (OR: 2.94, 2.22-3.89), liver cirrhosis (OR: 3.65, 1.82-7.04), low platelets (<50,000 cells/mm3; OR: OR: 2.2
When there are multiple outcome series of interest, Synthetic Control analyses typically proceed by estimating separate weights for each outcome. In this paper, we instead propose estimating a common set of weights across outcomes, by balancing either a vector of all outcomes or an index or average of them. Under a low-rank factor model, we show that these approaches lead to lower bias bounds than separate weights, and that averaging leads to further gains when the number of outcomes grows. We illustrate this via a re-analysis of the impact of the Flint water crisis on educational outcomes.
Mobile applications and other integration of information and communication technology (ICT) have become well-known in education to monitor teaching and learning activities. The analysis of student learning through evaluation is a growing area of interest for teachers in higher education aiming to enhance students learning experience. This paper describes a development of student outcomes monitoring tool that applies analytics to provide feedback to students as they progress in the ladder of achieving the intended learning outcomes. The student outcomes focus on the core elements of the curriculum; it offers detailed student outcomes where the result in courses evaluations and recordings are tracked and analyzed. The data revealed that the student outcomes monitoring and analytics tool is adequate in providing constant feedback to students on the achievement of the desired learning outcomes as well as support teachers in planning the teaching and learning activities, enhance feedback system, academic planning and improvement.
Studies that collect multi-outcome data such as tobacco and alcohol use are becoming increasingly common. In principle, multi-outcomes studies investigate the correlations between outcomes, including, causal links and/or joint distributions. Although there are many methods for studying multivariate outcomes, significant limitations regarding scale and interpretation persist. Here we introduce a model based on the exponential-family for discrete binary outcomes that provides a flexible framework for hypothesis testing of multiple binary outcomes in a computationally efficient fashion.
Dynamic treatment regimes formalize precision medicine as a sequence of decision rules, one for each stage of clinical intervention, that map current patient information to a recommended intervention. Optimal regimes are typically defined as maximizing some functional of a scalar outcome's distribution, e.g., the distribution's mean or median. However, in many clinical applications, there are multiple outcomes of interest. We consider the problem of estimating an optimal regime when there are multiple outcomes that are ordered by priority but which cannot be readily combined by domain experts into a meaningful single scalar outcome. We propose a definition of optimality in this setting and show that an optimal regime with respect to this definition leads to maximal mean utility under a large class of utility functions. Furthermore, we use inverse reinforcement learning to identify a composite outcome that most closely aligns with our definition within a pre-specified class. Simulation experiments and an application to data from a sequential multiple assignment randomized trial (SMART) on HIV/STI prevention illustrate the usefulness of the proposed approach.
In the quest to make defensible causal claims from observational data, it is sometimes possible to leverage information from "placebo treatments" and "placebo outcomes". Existing approaches employing such information focus largely on point identification and assume (i) "perfect placebos", meaning placebo treatments have precisely zero effect on the outcome and the real treatment has precisely zero effect on a placebo outcome; and (ii) "equiconfounding", meaning that the treatment-outcome relationship where one is a placebo suffers the same amount of confounding as does the real treatment-outcome relationship, on some scale. We instead consider an omitted variable bias framework, in which users can postulate ranges of values for the degree of unequal confounding and the degree of placebo imperfection. Once postulated, these assumptions identify or bound the linear estimates of treatment effects. Our approach also does not require using both a placebo treatment and placebo outcome, as some others do. While applicable in many settings, one ubiquitous use-case for this approach is to employ pre-treatment outcomes as (perfect) placebo outcomes, as in difference-in-difference. The parall
We generalize the polynomial-time outcome-complete simulation algorithm for stabilizer circuits in arXiv:2309.08676 to track global phases exactly, yielding what we call phased outcome-complete simulation. The original algorithm enabled equivalence checking of stabilizer circuits with intermediate measurements and conditional Pauli corrections for all input states and all measurement outcomes simultaneously, but it tracked quantum states only up to a global phase. Our generalization removes this limitation and enables equivalence checking for an important family of non-stabilizer circuits: stabilizer circuits augmented with single-qubit rotations $\exp(iαZ)$ by symbolic angles. Two such circuits are equivalent if they implement the same quantum channel for all values of the symbolic angles and all measurement outcomes, given a one-to-one correspondence between rotation angles in the two circuits and a mapping between measurement outcomes. This model enables testing of compilation algorithms that transform the Clifford portions of a computation while preserving rotation angles. Examples include Pauli-based computation, edge-disjoint path compilation for surface codes, and custom com