In the context of selective inference, confidence envelopes for the false discoveries allow the user to select any subset of null hypotheses while having a statistical guarantee on the number of false discoveries in the selected set. Many constructions of such envelopes have been proposed recently, using local test families (Genovese and Wasserman, 2006; Goeman and Solari, 2011), paths (Katsevich and Ramdas, 2020) or interpolation (Blanchard et al., 2020a). All those methods have in common that they have been well-studied for the homogeneous case where all p-values under the null have a uniform distribution over [0, 1]. However, in many applications the data are heterogeneous and discrete, hence the p-values have heterogeneous, discrete distributions, and the previous constructions may incur a loss of power, in the sense that they over-estimate the number of false discoveries. In this paper, we bridge the previous constructions under the homogeneous case with new tools. We also apply these tools to propose several confidence envelopes based on tools tailored for heterogeneous data, like the Bretagnolle inequality, or a new variant of the Simes inequality. We compare these new envel
Exploring new ideas is a fundamental aspect of research and development (R\&D), which often occurs in competitive environments. Most ideas are subsequent, i.e. one idea today leads to more ideas tomorrow. According to one approach, the best way to encourage exploration is by granting protection on discoveries to the first innovator. Correspondingly, only the one who made the first discovery can use the new knowledge and benefit from subsequent discoveries, which in turn should increase the initial motivation to explore. An alternative approach to promote exploration favors the \emph{sharing of knowledge} from discoveries among researchers allowing explorers to use each others' discoveries to develop further knowledge, as in the open-source community. With no protection, all explorers have access to all existing discoveries and new directions are explored faster. We present a game theoretic analysis of an abstract research-and-application game which clarifies the expected advantages and disadvantages of the two approaches under full information. We then compare the theoretical predictions with the observed behavior of actual players in the lab who operate under partial informati
To better align theories of paradigm shifting discoveries and empirics identifying them, we pro-pose a novel measure that incorporates a discovery impact, novelty, and tendency to break with the past into a single, coherent measure. Calibration using the National Inventor Hall of Fame data reveals that impact, novelty, and disruptiveness are strict complements meaning, for example, that greater impact cannot substitute for moderate novelty. We illustrate the workings of the measure using data on USPTO patents from 1982 to 2015.
The LIGO-Virgo-KAGRA (LVK) Collaboration has made breakthrough discoveries in gravitational-wave astronomy, a new field that provides a different means of observing our Universe. Gravitational-wave discoveries are possible thanks to the work of thousands of people from across the globe working together. In this article, we discuss the range of engagement activities used to communicate LVK gravitational-wave discoveries and the stories of the people behind the science, using the activities surrounding the release of the third Gravitational-Wave Transient Catalog as a case study.
We present the Ultracool dwarf Science with MachIne LEarning (USMILE), a program developing machine-learning tools for the discovery and characterization of ultracool dwarfs. We introduce USMILE Avocado, a spectral classification framework that uses broadband photometry from wide-field surveys -- Rubin Observatory LSST Data Preview 1, VISTA Hemisphere Survey, and CatWISE -- as input features. The framework has two gradient-boosted decision-tree models scalable to the massive data volumes of modern surveys: the classifier, which distinguishes ultracool dwarfs from stellar/extragalactic contaminants, and the regressor, which predicts spectral types. A key strength is its ability to natively handle missing photometric features, whereas earlier machine-learning approaches required complete multi-band detections or relied on imputation, thereby excluding genuine ultracool dwarfs or introducing bias. Trained on an augmented labeled dataset of >2 million sources built from known ultracool dwarfs, reddened early-type stars, and quasars, the models achieve strong performance: the classifier attains an ROC AUC of 0.976 and an F1 score of 0.92, while the regressor yields a mean-squared err
Just as the chemical elements from hydrogen (Z = 1) to oganesson (Z = 118) once were discovered, so were the numerous isotopes. The histories of how the isotopes were discovered are less well known, but in a few cases they are as interesting and instructive as those of the elements figuring in the periodic table. Following an overview of criteria usually associated with the concept of discovery in general, this paper examines in detail the historical developments that led to the discoveries of deuterium and tritium and also, as a by-product, the helium-3 isotope. It also includes a brief section on the neutron, which in the 1920s, when it was still a hypothetical particle, was sometimes discussed together with the mass-2 and mass-3 hydrogen isotopes. The paper concludes with a discussion of priority questions relating to suggestions of the two heavy isotopes as well as to their actual discoveries.
We report new methods for evaluating realistic observing programs that search stars for planets by direct imaging, where observations are selected from an optimized star list, and where stars can be observed multiple times. We show how these methods bring critical insight into the design of the mission & its instruments. These methods provide an estimate of the outcome of the observing program: the probability distribution of discoveries (detection and/or characterization), & an estimate of the occurrence rate of planets (eta). We show that these parameters can be accurately estimated from a single mission simulation, without the need for a complete Monte Carlo mission simulation, & we prove the accuracy of this new approach. Our methods provide the tools to define a mission for a particular science goal, for example defined by the expected number of discoveries and its confidence level. We detail how an optimized star list can be built & how successive observations can be selected. Our approach also provides other critical mission attributes, such as the number of stars expected to be searched, & the probability of zero discoveries. Because these attributes dep
We have inspected all supernova discoveries reported during 2010 and 2011 (538 and 926 events, respectively). We examine the statistics of all discovered objects, as well as those of the subset of spectroscopically-confirmed events. In these two years we see the rise of wide-field non-targeted supernova surveys to prominence, with the largest numbers of events reported by the CRTS and PTF surveys (572 and 393 events in total respectively, contributing together 74% of all reported discoveries in 2011), followed by the integrated contribution of numerous amateurs (184 events). Among spectroscopically-confirmed events the PTF (393 events) leads, followed by CRTS (170 events), and amateur discoveries (144 events). Traditional galaxy-targeted surveys, such as LOSS and CHASE, maintain a strong contribution (86 and 61 events, respectively) with high spectroscopic completeness (~90% per cent). It is interesting to note that the community managed to provide substantial spectroscopic follow-up for relatively brighter amateur discoveries (<m>=16.5 mag), but significant less help for fainter (and much more numerous) events promptly released by the CRTS (<m>=18.6 mag). Inspecting di
We present some new discoveries on the mathematical foundation of linear hydrodynamic stability theory. The new discoveries are: 1. Linearized Euler equations fail to provide a linear approximation on inviscid hydrodynamic stability. 2. Eigenvalue instability predicted by high Reynolds number linearized Navier-Stokes equations cannot capture the dominant instability of super fast growth. 3. As equations for directional differentials, Rayleigh equation and Orr-Sommerfeld equation cannot capture the nature of the full differentials.
Recent tools for interactive data exploration significantly increase the chance that users make false discoveries. The crux is that these tools implicitly allow the user to test a large body of different hypotheses with just a few clicks thus incurring in the issue commonly known in statistics as the multiple hypothesis testing error. In this paper, we propose solutions to integrate multiple hypothesis testing control into interactive data exploration tools. A key insight is that existing methods for controlling the false discovery rate (such as FDR) are not directly applicable for interactive data exploration. We therefore discuss a set of new control procedures that are better suited and integrated them in our system called Aware. By means of extensive experiments using both real-world and synthetic data sets we demonstrate how Aware can help experts and novice users alike to efficiently control false discoveries.
Many data mining and statistical machine learning algorithms have been developed to select a subset of covariates to associate with a response variable. Spurious discoveries can easily arise in high-dimensional data analysis due to enormous possibilities of such selections. How can we know statistically our discoveries better than those by chance? In this paper, we define a measure of goodness of spurious fit, which shows how good a response variable can be fitted by an optimally selected subset of covariates under the null model, and propose a simple and effective LAMM algorithm to compute it. It coincides with the maximum spurious correlation for linear models and can be regarded as a generalized maximum spurious correlation. We derive the asymptotic distribution of such goodness of spurious fit for generalized linear models and $L_1$ regression. Such an asymptotic distribution depends on the sample size, ambient dimension, the number of variables used in the fit, and the covariance information. It can be consistently estimated by multiplier bootstrapping and used as a benchmark to guard against spurious discoveries. It can also be applied to model selection, which considers only
Puzzles often give birth to the great discoveries, the false discoveries sometimes stimulate the exiting ideas in theoretical physics. The historical examples of both are described in Introduction and in section ``Cosmological Puzzles''. From existing puzzles most attention is given to Ultra High Energy Cosmic Ray (UHECR) puzzle and to cosmological constant problem. The 40-years old UHECR problem consisted in absence of the sharp steepening in spectrum of extragalactic cosmic rays caused by interaction with CMB radiation. This steepening is known as Greisen-Zatsepin-Kuzmin (GZK) cutoff. It is demonstrated here that the features of interaction of cosmic ray protons with CMB are seen now in the spectrum in the form of the dip and beginning of the GZK cutoff. The most serious cosmological problem is caused by large vacuum energy of the known elementary-particle fields which exceeds at least by 45 orders of magnitude the cosmological vacuum energy. The various ideas put forward to solve this problem during last 40 years, have weaknesses and cannot be accepted as the final solution of this puzzle. The anthropic approach is discussed.
The history of science reveals that major discoveries are not predictable. Naively, one might conclude therefore that it is not possible to artificially cultivate an environment that promotes discoveries. I suggest instead that open research without a programmatic agenda establishes a fertile ground for unexpected breakthroughs. Contrary to current practice, funding agencies should allocate a small fraction of their funds to support research in centers of excellence without programmatic reins tied to specific goals.
Fifty years ago two scientists, who celebrate their 80th birthdays in 2011, Alexander V. Voronel and Johannes V. Sengers performed breakthrough experiments that challenged the commonly accepted views on critical phenomena in fluids. Voronel discovered that the isochoric heat capacity of argon becomes infinite at the vapor-liquid critical point. Almost simultaneously, Sengers observed a similar anomaly for the thermal conductivity of near-critical carbon dioxide. The existence of these singularities was later proved to be universal for all fluids. These experiments had a profound effect on the development of the modern (scaling) theory of phase transitions, which is based on the diverging fluctuations of the order parameter. In particular, the discovery of the heat-capacity divergence at the critical point was a keystone for the formulation of static scaling theory, while the discovery of the divergence of the thermal conductivity played an important role in the formulation of dynamic scaling and mode-coupling theory. Moreover, owing to the discoveries made by Voronel and Sengers 50 years ago, critical phenomena in fluids have become an integral part of contemporary condensed-matter
Unsupervised machine learning is widely used to mine large, unlabeled datasets to make data-driven discoveries in critical domains such as climate science, biomedicine, astronomy, chemistry, and more. However, despite its widespread utilization, there is a lack of standardization in unsupervised learning workflows for making reliable and reproducible scientific discoveries. In this paper, we present a structured workflow for using unsupervised learning techniques in science. We highlight and discuss best practices starting with formulating validatable scientific questions, conducting robust data preparation and exploration, using a range of modeling techniques, performing rigorous validation by evaluating the stability and generalizability of unsupervised learning conclusions, and promoting effective communication and documentation of results to ensure reproducible scientific discoveries. To illustrate our proposed workflow, we present a case study from astronomy, seeking to refine globular clusters of Milky Way stars based upon their chemical composition. Our case study highlights the importance of validation and illustrates how the benefits of a carefully-designed workflow for un
Recent studies have shown that deep learning models are vulnerable to membership inference attacks (MIAs), which aim to infer whether a data record was used to train a target model or not. To analyze and study these vulnerabilities, various MIA methods have been proposed. Despite the significance and popularity of MIAs, existing works on MIAs are limited in providing guarantees on the false discovery rate (FDR), which refers to the expected proportion of false discoveries among the identified positive discoveries. However, it is very challenging to ensure the false discovery rate guarantees, because the underlying distribution is usually unknown, and the estimated non-member probabilities often exhibit interdependence. To tackle the above challenges, in this paper, we design a novel membership inference attack method, which can provide the guarantees on the false discovery rate. Additionally, we show that our method can also provide the marginal probability guarantee on labeling true non-member data as member data. Notably, our method can work as a wrapper that can be seamlessly integrated with existing MIA methods in a post-hoc manner, while also providing the FDR control. We perf
Philosophers have spilled much ink over the discovery of ideas in the classical 'context of discovery'. However, there has been little engagement with the question of what constitutes a discovery of 'things in the world'. A much-overlooked answer to this question is provided by T.S. Kuhn. In this paper, I show that discoveries awarded with a Nobel Prize in Physics in the past 53 years accord with a basic premise of Kuhn's account and his distinction between two types of natural kind discoveries. I also draw normative conclusions for credit attribution in science.
New technologies have led to vast troves of large and complex datasets across many scientific domains and industries. People routinely use machine learning techniques to not only process, visualize, and make predictions from this big data, but also to make data-driven discoveries. These discoveries are often made using Interpretable Machine Learning, or machine learning models and techniques that yield human understandable insights. In this paper, we discuss and review the field of interpretable machine learning, focusing especially on the techniques as they are often employed to generate new knowledge or make discoveries from large data sets. We outline the types of discoveries that can be made using Interpretable Machine Learning in both supervised and unsupervised settings. Additionally, we focus on the grand challenge of how to validate these discoveries in a data-driven manner, which promotes trust in machine learning systems and reproducibility in science. We discuss validation from both a practical perspective, reviewing approaches based on data-splitting and stability, as well as from a theoretical perspective, reviewing statistical results on model selection consistency an
The local false discovery rate (lfdr) of Efron et al. (2001) enjoys major conceptual and decision-theoretic advantages over the false discovery rate (FDR) as an error criterion in multiple testing, but is only well-defined in Bayesian models where the truth status of each null hypothesis is random. We define a frequentist counterpart to the lfdr based on the relative frequency of nulls at each point in the sample space. The frequentist lfdr is defined without reference to any prior, but preserves several important properties of the Bayesian lfdr: For continuous test statistics, $\text{lfdr}(t)$ gives the probability, conditional on observing some statistic equal to $t$, that the corresponding null hypothesis is true. Evaluating the lfdr at an individual test statistic also yields a calibrated forecast of whether its null hypothesis is true. Finally, thresholding the lfdr at $\frac{1}{1+λ}$ gives the best separable rejection rule under the weighted classification loss where Type I errors are $λ$ times as costly as Type II errors. The lfdr can be estimated efficiently using parametric or non-parametric methods, and a closely related error criterion can be provably controlled in finit
Wolf-Rayet stars (WRs) are evolved massive stars in the brief stage before they undergo core collapse. Not only are they rare, but they also can be particularly difficult to find due to the high extinction in the Galactic plane. This paper discusses the discovery of three new Galactic WRs previously classified as H$α$ emission stars, but thanks to Gaia spectra, we were able to identify the broad, strong emission lines that characterize WRs. Using the Lowell Discovery Telescope and the DeVeny spectrograph, we obtained spectra for each star. Two are WC9s, and the third is a WN6 + O6.5 V binary. The latter is a known eclipsing system with a 4.4 day period from ASAS-SN data. We calculate absolute visual magnitudes for all three stars to be between -7 and -6, which is consistent with our expectations of these subtypes. These discoveries highlight the incompleteness of the WR census in our local volume of the Milky Way and suggest the potential for future Galactic WR discoveries from Gaia low-dispersion spectra. Furthermore, radial velocity studies of the newly found binary will provide direct mass estimates and orbital parameters, adding to our knowledge of the role that binarity plays