搜索 — ResearchTracker

Data science has become increasingly essential for the production of official statistics, as it enables the automated collection, processing, and analysis of large amounts of data. With such data science practices in place, it enables more timely, more insightful and more flexible reporting. However, the quality and integrity of data-science-driven statistics rely on the accuracy and reliability of the data sources and the machine learning techniques that support them. In particular, changes in data sources are inevitable to occur and pose significant risks that are crucial to address in the context of machine learning for official statistics. This paper gives an overview of the main risks, liabilities, and uncertainties associated with changing data sources in the context of machine learning for official statistics. We provide a checklist of the most prevalent origins and causes of changing data sources; not only on a technical level but also regarding ownership, ethics, regulation, and public perception. Next, we highlight the repercussions of changing data sources on statistical reporting. These include technical effects such as concept drift, bias, availability, validity, accur

Saddlepoint approximations for likelihood ratio like statistics with applications to permutation tests

arXiv2012-03-14作者：John Kolassa, John Robinson

We obtain two theorems extending the use of a saddlepoint approximation to multiparameter problems for likelihood ratio-like statistics which allow their use in permutation and rank tests and could be used in bootstrap approximations. In the first, we show that in some cases when no density exists, the integral of the formal saddlepoint density over the set corresponding to large values of the likelihood ratio-like statistic approximates the true probability with relative error of order $1/n$. In the second, we give multivariate generalizations of the Lugannani--Rice and Barndorff-Nielsen or $r^*$ formulas for the approximations. These theorems are applied to obtain permutation tests based on the likelihood ratio-like statistics for the $k$ sample and the multivariate two-sample cases. Numerical examples are given to illustrate the high degree of accuracy, and these statistics are compared to the classical statistics in both cases.

搜索结果：Statistics

Changing Data Sources in the Age of Machine Learning for Official Statistics

Saddlepoint approximations for likelihood ratio like statistics with applications to permutation tests

Cramér-type moderate deviations for Studentized two-sample $U$-statistics with applications

Equivalence of distance-based and RKHS-based statistics in hypothesis testing

Leo Breiman: An important intellectual and personal force in statistics, my life and that of many others

Special section on statistics in the atmospheric sciences

Wavelet methods in statistics: Some recent developments and their applications

Optimal and fast detection of spatial clusters with scan statistics

Bridging centrality and extremity: Refining empirical data depth using extreme value statistics

Inference of weighted $V$-statistics for nonstationary time series and its applications

Acknowledgment of priority: Usage of the Lambert W function in statistics

Random fields of multivariate test statistics, with applications to shape analysis

Natural statistics for spectral samples

Serial and nonserial sign-and-rank statistics: asymptotic representation and asymptotic normality

Detection of spatial clustering with average likelihood ratio test statistics

Spectral statistics of large dimensional Spearman's rank correlation matrix and its application

Edgeworth expansions for studentized statistics under weak dependence

Computer-intensive rate estimation, diverging statistics and scanning

Multi-center clinical trials: Randomization and ancillary statistics

Editorial: Statistics and "The lost tomb of Jesus"