When a subgroup is identified from the data, it must be evaluated in a replicable way. The usual in-sample approach, which evaluates the post-hoc identified subgroup as predefined, might suffer from selection bias. This issue of in-sample evaluation of data-dependent objects is well recognized but particularly challenging here. Unlike discrete or finite-dimensional data-dependent objects addressed before, the selection bias here is induced by post-hoc identified subgroups, data-dependent sets potentially defined by infinite-dimensional functionals with nonsmooth boundaries known as nonregularity. The out-of-sample approach, which splits data for subgroup identification and evaluation, can help address selection bias but might suffer from efficiency loss and instability. In this paper, we propose a conditional adaptive perturbation approach to remove selection bias in in-sample subgroup evaluation and deliver valid inference on subgroups identified from the whole dataset by generic machine learning, regardless of whether regularity is satisfied. The proposed method is easy-to-compute, allows model-free and even black-box subgroup identification, and achieves full efficiency across b
Many statistical and econometric problems involve parameters defined by moments of a joint distribution when only marginal distributions are observed, leading naturally to partial identification. We develop a methodology for identification, estimation, and inference in the corresponding partially identified GMM model. We characterize the sharp identified set for the parameter of interest via a support-function/optimal-transport (OT) representation. To estimate the identified set, we employ entropic regularization, which yields a smooth approximation to the classical OT problem that can be computed efficiently using the Sinkhorn algorithm. We also propose a test statistic for hypothesis testing and the construction of confidence regions for the identified set. To derive its asymptotic distribution, we establish a novel central limit theorem for the entropic OT value under general smooth cost functions. We then obtain valid critical values using the bootstrap for directionally differentiable functionals of Fang and Santos (2019). The resulting testing procedure controls size locally uniformly, including at parameter values on the boundary of the identified set. We demonstrate good fi
We study the properties of the classical \emph{projection} method to conduct simultaneous inference about the coefficients of the structural impulse-response function and their identified set in Structural Vector Autoregressions. We show that -- as the sample size grows large -- projection inference produces regions for the structural parameters and their identified set with both frequentist coverage and robust Bayesian credibility of at least $1-α$. We then calibrate the radius of the Wald ellipsoid to guarantee that -- for a given posterior on the reduced-form parameters -- the robust Bayesian credibility of the projection method is exactly $1-α$. We illustrate the main results of the paper using a demand/supply model of the U.S.~labor market.
A partially identified model, where the parameters can not be uniquely identified, often arises during statistical analysis. While researchers frequently use Bayesian inference to analyze the models, when Bayesian inference with an off-the-shelf MCMC sampling algorithm is applied to a partially identified model, the computational performance can be poor. It is found that using importance sampling with transparent reparameterization (TP) is one remedy. This method is preferable since the model is known to be rendered as identified with respect to the new parameterization, and at the same time, it may allow faster, i.i.d. Monte Carlo sampling by using conjugate convenience priors. In this paper, we explain the importance sampling method with the TP and a pseudo-TP. We introduce the pseudo-TP, an alternative to TP, since finding a TP is sometimes difficult. Then, we test the methods' performance in some scenarios and compare it to the performance of the off-the-shelf MCMC method - Gibbs sampling - applied in the original parameterization. While the importance sampling with TP (ISTP) shows generally better results than off-the-shelf MCMC methods, as seen in the compute time and trace p
This paper considers the problem of ranking objects based on their latent merits using data from pairwise interactions. We allow for incomplete observation of these interactions and study what can be inferred about rankings in such settings. First, we show that identification of the ranking depends on a trade-off between the tournament graph and the interaction function: in parametric models, such as the Bradley-Terry-Luce, rankings are point identified even with sparse graphs, whereas nonparametric models require dense graphs. Second, moving beyond point identification, we characterize the identified set in the nonparametric model under any tournament structure and represent it through moment inequalities. Finally, we propose a likelihood-based statistic to test whether a ranking belongs to the identified set. We study two testing procedures: one is finite-sample valid but computationally intensive; the other is easy to implement and valid asymptotically. We illustrate our results using Brazilian employer-employee data to study how workers rank firms when moving across jobs.
This paper studies the identification of Structural Vector Autoregressions (SVARs) exploiting a break in the variances of the structural shocks. Point-identification for this class of models relies on an eigen-decomposition involving the covariance matrices of reduced-form errors and requires that all the eigenvalues are distinct. This point-identification, however, fails in the presence of multiplicity of eigenvalues. This occurs in an empirically relevant scenario where, for instance, only a subset of structural shocks had the break in their variances, or where a group of variables shows a variance shift of the same amount. Together with zero or sign restrictions on the structural parameters and impulse responses, we derive the identified sets for impulse responses and show how to compute them. We perform inference on the impulse response functions, building on the robust Bayesian approach developed for set identified SVARs. To illustrate our proposal, we present an empirical example based on the literature on the global crude oil market where the identification is expected to fail due to multiplicity of eigenvalues.
We classify graphs and, more generally, finite relational structures that are identified by C2, that is, two-variable first-order logic with counting. Using this classification, we show that it can be decided in almost linear time whether a structure is identified by C2. Our classification implies that for every graph identified by this logic, all vertex-colored versions of it are also identified. A similar statement is true for finite relational structures. We provide constructions that solve the inversion problem for finite structures in linear time. This problem has previously been shown to be polynomial time solvable by Martin Otto. For graphs, we conclude that every C2-equivalence class contains a graph whose orbits are exactly the classes of the C2-partition of its vertex set and which has a single automorphism witnessing this fact. For general k, we show that such statements are not true by providing examples of graphs of size linear in k which are identified by C3 but for which the orbit partition is strictly finer than the Ck-partition. We also provide identified graphs which have vertex-colored versions that are not identified by Ck.
We provide a comprehensive semi-parametric study of Bayesian partially identified econometric models. While the existing literature on Bayesian partial identification has mostly focused on the structural parameter, our primary focus is on Bayesian credible sets (BCS's) of the unknown identified set and the posterior distribution of its support function. We construct a (two-sided) BCS based on the support function of the identified set. We prove the Bernstein-von Mises theorem for the posterior distribution of the support function. This powerful result in turn infers that, while the BCS and the frequentist confidence set for the partially identified parameter are asymptotically different, our constructed BCS for the identified set has an asymptotically correct frequentist coverage probability. Importantly, we illustrate that the constructed BCS for the identified set does not require a prior on the structural parameter. It can be computed efficiently for subset inference, especially when the target of interest is a sub-vector of the partially identified parameter, where projecting to a low-dimensional subset is often required. Hence, the proposed methods are useful in many applicati
IceCube has reported evidence for neutrino emission from the Seyfert-II galaxy NGC 1068 and the blazar TXS 0506+056. The former was identified in a time-integrated search, and the latter using time-dependent and multi-messenger methods. A natural question is: are sources identified in time-integrated searches consistent with a steady neutrino source? We present a non-parametric method, TAUNTON, to answer this question. Motivated by the Cramér-von Mises test, TAUNTON is an unbinned single-hypothesis method to identify deviations in neutrino data from the steady hypothesis. An advantage of TAUNTON is that it is sensitive to arbitrary deviations from the steady hypothesis. Here we present results of TAUNTON applied to a 8.7 year data-set of muon neutrino track events; the same data used to identify NGC 1068 at 4.2$σ$. We use TAUNTON on 51 objects, a subset (with >4 signal neutrinos) of the 110 objects studied in the NGC 1068 publication. We set a threshold of 3$σ$ pre-trial to identify sources inconsistent with the steady hypothesis. TAUNTON reports a p-value of 0.9 for NGC 1068, consistent with the steady hypothesis. Using the time integrated fit, data for TXS 0506+056 is consiste
Identified particle spectra represent a crucial tool to understand the behavior of the matter created in high-energy heavy-ion collisions. The transverse momentum p_T distributions of identified hadrons contain informations about the transverse expansion of the system and constrain the freezeout properties of the matter created. The ALICE experiment has good particle identification performance over a broad p_T range. In this contribution the results for identified pions, kaons and protons in heavy-ion collisions at 2.76 TeV center-of-mass energy are presented. These results are compared with other identified particle measurements obtained by previous experiments, and discussed in terms of the thermal and hydrodynamic pictures. The status of extensions of this analysis, with the study of identified particles as a function of event-by-event flow in Pb-Pb collisions, is also discussed.
In complicated/nonlinear parametric models, it is generally hard to know whether the model parameters are point identified. We provide computationally attractive procedures to construct confidence sets (CSs) for identified sets of full parameters and of subvectors in models defined through a likelihood or a vector of moment equalities or inequalities. These CSs are based on level sets of optimal sample criterion functions (such as likelihood or optimally-weighted or continuously-updated GMM criterions). The level sets are constructed using cutoffs that are computed via Monte Carlo (MC) simulations directly from the quasi-posterior distributions of the criterions. We establish new Bernstein-von Mises (or Bayesian Wilks) type theorems for the quasi-posterior distributions of the quasi-likelihood ratio (QLR) and profile QLR in partially-identified regular models and some non-regular models. These results imply that our MC CSs have exact asymptotic frequentist coverage for identified sets of full parameters and of subvectors in partially-identified regular models, and have valid but potentially conservative coverage in models with reduced-form parameters on the boundary. Our MC CSs for
In a variety of applications, including nonparametric instrumental variable (NPIV) analysis, proximal causal inference under unmeasured confounding, and missing-not-at-random data with shadow variables, we are interested in inference on a continuous linear functional (e.g., average causal effects) of nuisance function (e.g., NPIV regression) defined by conditional moment restrictions. These nuisance functions are generally weakly identified, in that the conditional moment restrictions can be severely ill-posed as well as admit multiple solutions. This is sometimes resolved by imposing strong conditions that imply the function can be estimated at rates that make inference on the functional possible. In this paper, we study a novel condition for the functional to be strongly identified even when the nuisance function is not; that is, the functional is amenable to asymptotically-normal estimation at $\sqrt{n}$-rates. The condition implies the existence of debiasing nuisance functions, and we propose penalized minimax estimators for both the primary and debiasing nuisance functions. The proposed nuisance estimators can accommodate flexible function classes, and importantly they can con
I develop algorithms to facilitate Bayesian inference in structural vector autoregressions that are set-identified with sign and zero restrictions by showing that the system of restrictions is equivalent to a system of sign restrictions in a lower-dimensional space. Consequently, algorithms applicable under sign restrictions can be extended to allow for zero restrictions. Specifically, I extend algorithms proposed in Amir-Ahmadi and Drautzburg (2021) to check whether the identified set is nonempty and to sample from the identified set without rejection sampling. I compare the new algorithms to alternatives by applying them to variations of the model considered by Arias et al. (2019), who estimate the effects of US monetary policy using sign and zero restrictions on the monetary policy reaction function. The new algorithms are particularly useful when a rich set of sign restrictions substantially truncates the identified set given the zero restrictions.
This paper studies a regularized support function estimator for bounds on components of the parameter vector in the case in which the identified set is a polygon. The proposed regularized estimator has three important properties: (i) it has a uniform asymptotic Gaussian limit in the presence of flat faces in the absence of redundant (or overidentifying) constraints (or vice versa); (ii) the bias from regularization does not enter the first-order limiting distribution; (iii) the estimator remains consistent for sharp (non-enlarged) identified set for the individual components even in the non-regualar case. These properties are used to construct \emph{uniformly valid} confidence sets for an element $θ_{1}$ of a parameter vector $θ\in\mathbb{R}^{d}$ that is partially identified by affine moment equality and inequality conditions. The proposed confidence sets can be computed as a solution to a small number of linear and convex quadratic programs, leading to a substantial decrease in computation time and guarantees a global optimum. As a result, the method provides a uniformly valid inference in applications in which the dimension of the parameter space, $d$, and the number of inequalit
We extend the constrained maximum likelihood estimation theory for parameters of a completely identified model, proposed by Aitchison and Silvey (1958), to parameters arising from a partially identified model. With a partially identified model, some parameters of the model may only be identified through constraints imposed by additional assumptions. We show that, under certain conditions, the constrained maximum likelihood estimator exists and locally maximize the likelihood function subject to constraints. We then study the asymptotic distribution of the estimator and propose a numerical algorithm for estimating parameters. We also discuss a special situation where exploiting additional assumptions does not improve estimation efficiency.
Recent results on identified hadrons from the PHENIX experiment in Au+Au collisions at mid-rapidity at $\sqrt{s_{NN}}$ = 200 GeV are presented. The centrality dependence of transverse momentum distributions and particle ratios for identified charged hadrons are studied. The transverse flow velocity and freeze-out temperature are extracted from $p_{T}$ spectra within the framework of a hydrodynamic collective flow model. Two-particle HBT correlations for charged pions are measured in different centrality selections for a broad range of transverse momentum of the pair. Results on elliptic flow measurements with respect to the reaction plane for identified particles are also presented.
Identified particles have long been of great interest at RHIC in large part because of the baryon/meson differences observed at intermediate $p_T$ and the implications for hadronization via quark coalescence. With recent high statistics data identified particles are also now central to understanding the details of the jet-medium interactions and energy loss and hadron formation at intermediate and high $p_T$. In particular, high $p_T$ identified particle spectra along with two-particle correlations triggered with direct photons, neutral pions or electrons from heavy flavor decay with hadrons can provide information about how medium modifications to jet fragmentation depend on parton type. I will review recent results with identified particles both in heavy ion systems and the reference measurements in p+p collisions.
The ALICE experiment features multiple particle identification systems. The measurement of the identified charged hadron $p_{t}$ spectra in proton-proton collisions at $\sqrt{s}=900$ GeV will be discussed. In the central rapidity region ($|η|<0.9$) particle identification and tracking are performed using the Inner Tracking System (ITS), which is the closest detector to the beam axis, the Time Projection Chamber (TPC) and a dedicated time-of-flight system (TOF). Particles are mainly identified using the energy loss signal in the ITS and TPC. In addition, the information from TOF is used to identify hadrons at higher momenta. Finally, the kink topology of the weak decay of charged kaons provides an alternative method to extract the transverse momentum spectra of charged kaons. This combination allows to track and identify charged hadrons in the transverse momentum ($p_{t}$) range from 100 MeV/c up to 2.5 GeV/$c$. Mesons containing strange quarks (\kos, $φ$) and both singly and doubly strange baryons (\lam, \lambar, and \xip + \xim) are identified by their decay topology inside the TPC detector. Results obtained with the various identification tools above described and a comparison
In this article we establish the relation between the spines of 3-manifolds and the polyhedra with identified faces. We do this by showing that the spines of the closed, connected, orientable 3-manifolds can be presented through polyhedra with identified faces in a very natural way. We also prove the equivalence between the special spines and a certain type of polyhedra, and other related results.
The Identifying Code (IC) problem seeks a vertex subset whose intersection with every vertex's closed neighborhood is unique, enabling fault detection in multiprocessor systems and practical uses in identity verification, environmental monitoring, and dynamic localization. A closely related problem is the Locating-Dominating Set (LD), which requires each non-dominating vertex to be uniquely identified by its intersection with the set. Cappelle, Gomes, and Santos (2021) proved that LD is W-hard for minimum clique cover and lacks polynomial kernels for parameters such as vertex cover, but their methods did not apply to IC. This paper answers their question by showing that IC does not admit a polynomial kernel parameterized by solution size plus vertex cover unless NP is a subset of coNP/poly.