We study the problem of understanding where two populations differ within a feature space, which we formalize in the concept of a differential subgroup: a subset of individuals from both populations who, despite sharing similar characteristics, exhibit exceptional differences in a target outcome. Differential subgroups reveal the regions of the feature space where population-level gaps are most pronounced and can help practitioners identify the covariate combinations that are structurally responsible for these differences, e.g.~in clinical analysis, model diagnostics, or treatment-effect studies. We introduce a general optimization objective for discovering differential subgroups and establish conditions under which the resulting subgroups admit a causal interpretation of population differences. We propose DiffSub, a gradient-based approach that discovers interpretable differential subgroups in tabular data. Across synthetic benchmarks, medical case studies, model-error analyses, and treatment-effect settings, DiffSub identifies informative subgroups that reveal where population differences arise and why.
Triple difference designs have become increasingly popular in empirical economics. The advantage of a triple difference design is that, within a treatment group, it allows for another subgroup of the population -- potentially less impacted by the treatment -- to serve as a control for the subgroup of interest. While literature on difference-in-differences has discussed heterogeneity in treatment effects between treated and control groups or over time, little attention has been given to the implications of heterogeneity in treatment effects between subgroups. In this paper, I show that the parameter identified under the usual triple difference assumptions does not allow for causal interpretation of differences between subgroups when subgroups may differ in their underlying (unobserved) treatment effects. I propose a new parameter of interest, the causal difference in average treatment effects on the treated, which makes causal comparisons between subgroups. I discuss assumptions for identification and derive the semiparametric efficiency bounds for this parameter. I then propose doubly-robust, efficient estimators for this parameter. I use a simulation study to highlight the desirab
In this paper, we have studied the impact of two different formalisms of quantum decoherence in determining the sensitivities of the two future long-baseline experiments DUNE and P2SO. In Formalism-A, we will assume that the decoherence matrix is defined in a matter mass eigenstate basis which is the basis that diagonalizes the Hamiltonian for neutrinos in matter, with a constant matter density. In Formalism-B, we will define the decoherence matrix in the vacuum mass eigenstate basis and then rotate it to matter mass basis via an unitary transformation. By using different values of the decoherence parameter $Γ$, we will show how these two formalisms differ at the probability level and then we will demonstrate how the sensitivities can differ at the $χ^2$ level. Our results show that if the values of $Γ$ is small, then these two formalisms yield same probability in vacuum. However, if the values of $Γ$ is large or if there is strong matter effect, then these two formalisms yield very different results.
In this paper, we formalize a triple instrumented difference-in-differences (DID-IV). In this design, a triple Wald-DID estimand, which divides the difference-in-difference-in-differences (DDD) estimand of the outcome by the DDD estimand of the treatment, captures the local average treatment effect on the treated. The identifying assumptions mainly comprise a monotonicity assumption, and the common acceleration assumptions in the treatment and the outcome. We extend the canonical triple DID-IV design to staggered instrument cases. We also describe the estimation and inference in this design in practice.
Many policy evaluations involve vectors of category-specific quantities, either categorical outcomes (e.g., employment type, major choice) or compositional measures (e.g., GDP by sector, votes by party, electricity generation by source). In these settings, both intensive margins (shares) and extensive margins (totals) can matter. However, existing Difference-in-Differences (DiD) strategies typically focus only on the shares and do not jointly identify treatment effects on totals. In addition, these approaches usually lack a clear economic interpretation. I develop Compositional Difference-in-Differences (CoDiD), a new framework that identifies treatment effects on both shares and totals in a coherent way. The key assumption is parallel growth: in the absence of treatment, the log-quantities of each category would have evolved in parallel for the treated and control groups. I show that, under a random-utility discrete-choice model, this condition is equivalent to parallel trends in expected utilities, meaning that the change in average latent attractiveness for each alternative is identical across groups. Furthermore, geometrically, the counterfactual distributions (shares) follow p
Triple difference-in-differences designs are widely used to estimate causal effects in empirical work. Surveying the literature, we find that most applications include controls. We show that this standard practice is generally biased for the target causal estimand when covariate distributions differ across groups. To address this, we propose identifying a causal estimand by fixing the covariate distribution to that of one group. We then develop a double-robust estimator and illustrate its application in a canonical policy setting.
Most general population web surveys are based on online panels maintained by commercial survey agencies. However, survey agencies differ in their panel selection and management strategies. Little is known if these different strategies cause differences in survey estimates. This paper presents the results of a systematic study designed to analyze the differences in web survey results between agencies. Six different survey agencies were commissioned with the same web survey using an identical standardized questionnaire covering factual health items. Five surveys were fielded at the same time. A calibration approach was used to control the effect of demographics on the outcome. Overall, the results show differences between probability and non-probability surveys in health estimates, which were reduced but not eliminated by weighting. Furthermore, the differences between non-probability surveys before and after weighting are larger than expected between random samples from the same population.
Urban vibrancy is the dynamic activity of humans in urban locations. It can vary with urban features and the opportunities for human interactions, but it might also differ according to the underlying social conditions of city inhabitants across and within social surroundings. Such heterogeneity in how different demographic groups may experience cities has the potential to cause gender segregation because of differences in the preferences of inhabitants, their accessibility and opportunities, and large-scale mobility behaviours. However, traditional studies have failed to capture fully a high-frequency understanding of how urban vibrancy is linked to urban features, how this might differ for different genders, and how this might affect segregation in cities. Our results show that (1) there are differences between males and females in terms of urban vibrancy, (2) the differences relate to `Points of Interest` as well as transportation networks, and (3) that there are both positive and negative `spatial spillovers` existing across each city. To do this, we use a quantitative approach using Call Detail Record data--taking advantage of the near-ubiquitous use of mobile phones--to gain h
Methods for automatic chemical retrosynthesis have found recent success through the application of models traditionally built for natural language processing, primarily through transformer neural networks. These models have demonstrated significant ability to translate between the SMILES encodings of chemical products and reactants, but are constrained as a result of their autoregressive nature. We propose DiffER, an alternative template-free method for retrosynthesis prediction in the form of categorical diffusion, which allows the entire output SMILES sequence to be predicted in unison. We construct an ensemble of diffusion models which achieves state-of-the-art performance for top-1 accuracy and competitive performance for top-3, top-5, and top-10 accuracy among template-free methods. We prove that DiffER is a strong baseline for a new class of template-free model, capable of learning a variety of synthetic techniques used in laboratory settings and outperforming a variety of other template-free methods on top-k accuracy metrics. By constructing an ensemble of categorical diffusion models with a novel length prediction component with variance, our method is able to approximately
Difference-in-differences is one of the most used identification strategies in empirical work in economics. This chapter reviews a number of important, recent developments related to difference-in-differences. First, this chapter reviews recent work pointing out limitations of two way fixed effects regressions (these are panel data regressions that have been the dominant approach to implementing difference-in-differences identification strategies) that arise in empirically relevant settings where there are more than two time periods, variation in treatment timing across units, and treatment effect heterogeneity. Second, this chapter reviews recently proposed alternative approaches that are able to circumvent these issues without being substantially more complicated to implement. Third, this chapter covers a number of extensions to these results, paying particular attention to (i) parallel trends assumptions that hold only after conditioning on observed covariates and (ii) strategies to partially identify causal effect parameters in difference-in-differences applications in cases where the parallel trends assumption may be violated.
We propose the Sequential Synthetic Difference-in-Differences (Sequential SDiD) estimator for event studies with staggered treatment adoption, particularly when the parallel trends assumption fails. The method uses an iterative imputation procedure on aggregated data, where estimates for early-adopting cohorts are used to construct counterfactuals for later ones. We prove the estimator is asymptotically equivalent to an infeasible oracle OLS estimator within a linear model with interactive fixed effects. This key theoretical result provides a foundation for standard inference by establishing asymptotic normality and clarifying the estimator's efficiency. By offering a robust and transparent method with formal statistical guarantees, Sequential SDiD is a powerful alternative to conventional difference-in-differences strategies.
We explore from several perspectives the following question: given $X\subseteq \mathbb{Z}$ and $N\in \mathbb{N}$, what is the maximum size $D(X,N)$ of $A\subseteq \{1,2,\dots,N\}$ before $A$ is forced to contain two distinct elements that differ by an element of $X$? The set of forbidden differences, $X$, is called \textit{intersective} if $D(X,N)=o(N)$, with the most well-studied examples being $X=S=\{n^2: n\in \mathbb{N}\}$ and $X=\mathcal{P}-1=\{p-1: p\text{ prime}\}$. In addition to some new results, including exact formulas and estimates for $D(X,N)$ in some non-intersective cases like $X=\mathcal{P}$ and $X=S+k$, $k\in \mathbb{N}$, we also provide a comprehensive survey of known bounds and extensive computational data. In particular, we utilize an existing algorithm for finding maximum cliques in graphs to determine $D(S,N)$ for $N\leq 300$ and $D(\mathcal{P}-1,N)$ for $N\leq 500$. None of these exact values appear previously in the literature.
We formulate factorial difference-in-differences (FDID), a research design that extends canonical difference-in-differences (DID) to settings in which an event affects all units. In many panel data applications, researchers exploit cross-sectional variation in a baseline factor alongside temporal variation in the event, but the corresponding estimand is often implicit and the justification for applying the DID estimator remains unclear. We frame FDID as a factorial design with two factors, the baseline factor $G$ and the exposure level $Z$, and define effect modification and causal moderation as the associative and causal effects of $G$ on the effect of $Z$, respectively. Under standard DID assumptions of no anticipation and parallel trends, the DID estimator identifies effect modification but not causal moderation. Identifying the latter requires an additional \emph{factorial parallel trends} assumption, that is, mean independence between $G$ and potential outcome trends. We extend the framework to conditionally valid assumptions and regression-based implementations, and further to repeated cross-sectional data and continuous $G$. We demonstrate the framework with an empirical app
Worldviews may differ significantly according to political orientation. Even a single word can have a completely different meaning depending on political orientation. However, no direct evidence has been obtained on differences in the semantic processing of single words in naturalistic information between individuals with different political orientations. The present study aimed to fill this gap. We measured electroencephalographic signals while participants with different political orientations listened to naturalistic content. Responses for moral-, ideology-, and policy-related words between and within the participant groups were then compared. Within-group comparisons showed that right-leaning participants reacted more to moral-related words than to policy-related words, while left-leaning participants reacted more to policy-related words than to moral-related words. In addition, between-group comparisons also showed that neural responses for moral-related words were greater in right-leaning participants than in left-leaning participants and those for policy-related words were lesser in right-leaning participants than in neutral participants. There was a significant correlation
Difference-in-differences (DID) is commonly used to estimate treatment effects but is infeasible in settings where data are unpoolable due to privacy concerns or legal restrictions on data sharing, particularly across jurisdictions. In this study, we identify and relax the assumption of data poolability in DID estimation. We propose an innovative approach to estimate DID with unpoolable data (UN-DID) which can accommodate covariates, multiple groups, and staggered adoption. Through analytical proofs and Monte Carlo simulations, we show that UN-DID and conventional DID estimates of the average treatment effect and standard errors are equal and unbiased in settings without covariates. With covariates, both methods produce estimates that are unbiased, equivalent, and converge to the true value. The estimates differ slightly but the statistical inference and substantive conclusions remain the same. Two empirical examples with real-world data further underscore UN-DID's utility. The UN-DID method allows the estimation of cross-jurisdictional treatment effects with unpoolable data, enabling better counterfactuals to be used and new research questions to be answered.
In this paper, we describe a computational implementation of the Synthetic difference-in-differences (SDID) estimator of Arkhangelsky et al. (2021) for Stata. Synthetic difference-in-differences can be used in a wide class of circumstances where treatment effects on some particular policy or event are desired, and repeated observations on treated and untreated units are available over time. We lay out the theory underlying SDID, both when there is a single treatment adoption date and when adoption is staggered over time, and discuss estimation and inference in each of these cases. We introduce the sdid command which implements these methods in Stata, and provide a number of examples of use, discussing estimation, inference, and visualization of results.
We propose a new method for estimating causal effects in longitudinal/panel data settings that we call generalized difference-in-differences. Our approach unifies two alternative approaches in these settings: ignorability estimators (e.g., synthetic controls) and difference-in-differences (DiD) estimators. We propose a new identifying assumption -- a stable bias assumption -- which generalizes the conditional parallel trends assumption in DiD, leading to the proposed generalized DiD framework. This change gives generalized DiD estimators the flexibility of ignorability estimators while maintaining the robustness to unobserved confounding of DiD. We also show how ignorability and DiD estimators are special cases of generalized DiD. We then propose influence-function based estimators of the observed data functional, allowing the use of double/debiased machine learning for estimation. We also show how generalized DiD easily extends to include clustered treatment assignment and staggered adoption settings, and we discuss how the framework can facilitate estimation of other treatment effects beyond the average treatment effect on the treated. Finally, we provide simulations which show t
ResearchGate has emerged as a popular professional network for scientists and researchers in a very short span of time. Similar to Google Scholar, the ResearchGate indexing uses an automatic crawling algorithm that extracts bibliographic data, citations and other information about scholarly articles from various sources. However, it has been observed that the two platforms often show different publication and citation data for the same institutions, journals and authors. This paper, therefore, attempts to analyse and measure the differences in publication counts, citations and different metrics of the two platforms for a large data set of highly cited authors. The results indicate that there are significantly high differences in publication counts and citations for the same authors in the two platforms, with Google Scholar having higher counts for a vast majority of the cases. The different metrics computed by the two platforms also differ in their values, showing different degrees of correlations. The coverage policy, indexing errors, author attribution mechanism and strategy to deal with predatory publishing are found to be the main probable reasons for the differences in the two
Unmeasured confounding is a key threat to reliable causal inference based on observational studies. Motivated from two powerful natural experiment devices, the instrumental variables and difference-in-differences, we propose a new method called instrumented difference-in-differences that explicitly leverages exogenous randomness in an exposure trend to estimate the average and conditional average treatment effect in the presence of unmeasured confounding. We develop the identification assumptions using the potential outcomes framework. We propose a Wald estimator and a class of multiply robust and efficient semiparametric estimators, with provable consistency and asymptotic normality. In addition, we extend the instrumented difference-in-differences to a two-sample design to facilitate investigations of delayed treatment effect and provide a measure of weak identification. We demonstrate our results in simulated and real datasets.
When does the presence of an outlier in some measured property indicate that the outlying object differs qualitatively, rather than quantitatively, from other members of its apparent class? Historical examples include the many types of supernovæ and short {\it vs.\/} long Gamma Ray Bursts. There may be only one parameter and one outlier, so that principal component analyses are inapplicable. A qualitative difference implies that some parameter has a characteristic scale, and hence its distribution cannot be a power law (that can have no such scale). If the distribution is a power law the objects differ only quantitatively. The applicability of a power law to an empirical distribution may be tested by comparing the most extreme member to its next-most extreme. The probability distribution of their ratio is calculated, and compared to data for stars, radio and X-ray sources, and the fluxes, fluences and rotation measures of Fast Radio Bursts. It is found with high statistical significance that the giant outburst of soft gamma repeater SGR 1806-20 differed qualitatively from its lesser outbursts and FRB 200428 differed qualitatively from other FRB (by location in the Galaxy), but that