共找到 20 条结果
Statistics has moved beyond the frequentist-Bayesian controversies of the past. Where does this leave our ability to interpret results? I suggest that a philosophy compatible with statistical practice, labeled here statistical pragmatism, serves as a foundation for inference. Statistical pragmatism is inclusive and emphasizes the assumptions that connect statistical models with observed data. I argue that introductory courses often mischaracterize the process of statistical inference and I propose an alternative "big picture" depiction.
Statistical graphics and data visualization have long histories, but their modern forms began only in the early 1800s. Between roughly 1850 and 1900 ($\pm10$), an explosive growth occurred in both the general use of graphic methods and the range of topics to which they were applied. Innovations were prodigious and some of the most exquisite graphics ever produced appeared, resulting in what may be called the ``Golden Age of Statistical Graphics.'' In this article I trace the origins of this period in terms of the infrastructure required to produce this explosive growth: recognition of the importance of systematic data collection by the state; the rise of statistical theory and statistical thinking; enabling developments of technology; and inventions of novel methods to portray statistical data. To illustrate, I describe some specific contributions that give rise to the appellation ``Golden Age.''
In a quantum universe with a strong arrow of time, it is standard to postulate that the initial wave function started in a particular macrostate--the special low-entropy macrostate selected by the Past Hypothesis. Moreover, there is an additional postulate about statistical mechanical probabilities according to which the initial wave function is a "typical" choice in the macrostate (the Statistical Postulate). Together, they support a probabilistic version of the Second Law of Thermodynamics: typical initial wave functions will increase in entropy. Hence, there are two sources of randomness in such a universe: the quantum-mechanical probabilities of the Born rule and the statistical mechanical probabilities of the Statistical Postulate. I propose a new way to understand time's arrow in a quantum universe. It is based on what I call the Thermodynamic Theories of Quantum Mechanics. According to this perspective, there is a natural choice for the initial quantum state of the universe, which is given by not a wave function but by a density matrix. The initial density matrix of the universe is exactly the (normalized) projection operator onto the Past Hypothesis subspace (of the Hilbert
Modern statistical inference tasks often require iterative optimization methods to compute the solution. Convergence analysis from an optimization viewpoint only informs us how well the solution is approximated numerically but overlooks the sampling nature of the data. In contrast, recognizing the randomness in the data, statisticians are keen to provide uncertainty quantification, or confidence, for the solution obtained using iterative optimization methods. This paper makes progress along this direction by introducing the moment-adjusted stochastic gradient descents, a new stochastic optimization method for statistical inference. We establish non-asymptotic theory that characterizes the statistical distribution for certain iterative methods with optimization guarantees. On the statistical front, the theory allows for model mis-specification, with very mild conditions on the data. For optimization, the theory is flexible for both convex and non-convex cases. Remarkably, the moment-adjusting idea motivated from "error standardization" in statistics achieves a similar effect as acceleration in first-order optimization methods used to fit generalized linear models. We also demonstrat
Cartogram drawing is a technique for showing geography-related statistical information, such as demographic and epidemiological data. The idea is to distort a map by resizing its regions according to a statistical parameter by keeping the map recognizable. This article describes an R package implementing an algorithm called RecMap which approximates every map region by a rectangle where the area corresponds to the given statistical value (maintain zero cartographic error). The package implements the computationally intensive tasks in C++. This paper's contribution is that it demonstrates on real and synthetic maps how recmap can be used, how it is implemented and used with other statistical packages.
When Lenz proposed a simple model for phase transitions in magnetism, he couldn't have imagined that the "Ising model" was to become a jewel in field of equilibrium statistical mechanics. Its role spans the spectrum, from a good pedagogical example to a universality class in critical phenomena. A quarter century ago, Katz, Lebowitz and Spohn found a similar treasure. By introducing a seemingly trivial modification to the Ising lattice gas, they took it into the vast realms of non-equilibrium statistical mechanics. An abundant variety of unexpected behavior emerged and caught many of us by surprise. We present a brief review of some of the new insights garnered and some of the outstanding puzzles, as well as speculate on the model's role in the future of non-equilibrium statistical physics.
This is an invited contribution to the discussion on Professor Deborah Mayo's paper, "On the Birnbaum argument for the strong likelihood principle," to appear in Statistical Science. Mayo clearly demonstrates that statistical methods violating the likelihood principle need not violate either the sufficiency or conditionality principle, thus refuting Birnbaum's claim. With the constraints of Birnbaum's theorem lifted, we revisit the foundations of statistical inference, focusing on some new foundational principles, the inferential model framework, and connections with sufficiency and conditioning. [arXiv:1302.7021]
Visual insights into a wide variety of statistical methods, for both didactic and data analytic purposes, can often be achieved through geometric diagrams and geometrically based statistical graphs. This paper extols and illustrates the virtues of the ellipse and her higher-dimensional cousins for both these purposes in a variety of contexts, including linear models, multivariate linear models and mixed-effect models. We emphasize the strong relationships among statistical methods, matrix-algebraic solutions and geometry that can often be easily understood in terms of ellipses.
Software design impacts much of statistical analysis and, as technology changes, dramatically so in recent years, it is exciting to learn how statistical software is adapting and changing. This leads to the collection of papers published here, written by John Chambers, Duncan Temple Lang, Michael Lawrence, Martin Morgan, Yihui Xie, Heike Hofmann and Xiaoyue Cheng.
We consider Feller mean-reverting square-root diffusion, which has been applied to model a wide variety of processes with linearly state-dependent diffusion, such as stochastic volatility and interest rates in finance, and neuronal and populations dynamics in natural sciences. We focus on the statistical mixing (or superstatistical) process in which the parameter related to the mean value can fluctuate - a plausible mechanism for the emergence of heavy-tailed distributions. We obtain analytical results for the associated probability density function (both stationary and time dependent), its correlation structure and aggregation properties. Our results are applied to explain the statistics of stock traded volume at different aggregation scales.
Hospital profiling involves a comparison of a health care provider's structure, processes of care, or outcomes to a standard, often in the form of a report card. Given the ubiquity of report cards and similar consumer ratings in contemporary American culture, it is notable that these are a relatively recent phenomenon in health care. Prior to the 1986 release of Medicare hospital outcome data, little such information was publicly available. We review the historical evolution of hospital profiling with special emphasis on outcomes; present a detailed history of cardiac surgery report cards, the paradigm for modern provider profiling; discuss the potential unintended negative consequences of public report cards; and describe various statistical methodologies for quantifying the relative performance of cardiac surgery programs. Outstanding statistical issues are also described.
Statistical comparisons of electoral variables are made between groups of electronic voting machines and voting centers classified by types of transmissions according to the volume of traffic in incoming and outgoing data of machines from and toward the National Electoral Council (CNE) totalizing servers. One unexpectedly finds two types of behavior in wire telephony data transmissions and only one type where cellular telephony is employed, contravening any reasonable electoral normative. Differentiation in data transmissions arise when comparing number of incoming and outgoing data bytes per machine against total number of votes per machine reported officially by the CNE. The respective distributions of electoral variables for each type of transmission show that the groups classified by it do not correspond to random sets of the electoral universe. In particular, the distributions for the NO percentage of votes per machine differ statistically across groups. The presidential elections of 1998, 2000 and the 2004 Presidential Recall Referendum (2004 PRR) are compared according to the type of transmissions in 2004 PRR. Statistically, the difference between the empirical distributions
This special issue is a product of the First Interdisciplinary Symposium on Statistical Challenges and Opportunities in Electronic Commerce Research, which took place on May 22--23, 2005, at the Robert H. Smith School of Business, University of Maryland, College Park (\url{www.smith.umd.edu/dit/statschallenges/}). The symposium brought together, for the first time, researchers from statistics, information systems, and related fields, all of whom work or are interested in empirical research related to electronic commerce. The goal of the symposium was to cross the borders, discuss joint research opportunities, expose this field and its statistical challenges, and promote collaboration between the different fields.
Wildfire is an important system process of the earth that occurs across a wide range of spatial and temporal scales. A variety of methods have been used to predict wildfire phenomena during the past century to better our understanding of fire processes and to inform fire and land management decision-making. Statistical methods have an important role in wildfire prediction due to the inherent stochastic nature of fire phenomena at all scales. Predictive models have exploited several sources of data describing fire phenomena. Experimental data are scarce; observational data are dominated by statistics compiled by government fire management agencies, primarily for administrative purposes and increasingly from remote sensing observations. Fires are rare events at many scales. The data describing fire phenomena can be zero-heavy and nonstationary over both space and time. Users of fire modeling methodologies are mainly fire management agencies often working under great time constraints, thus, complex models have to be efficiently estimated. We focus on providing an understanding of some of the information needed for fire management decision-making and of the challenges involved in predi
Rejoinder to "Statistical Modeling of Spatial Extremes" by A. C. Davison, S. A. Padoan and M. Ribatet [arXiv:1208.3378].
Identifying the risk factors for mental illnesses is of significant public health importance. Diagnosis, stigma associated with mental illnesses, comorbidity, and complex etiologies, among others, make it very challenging to study mental disorders. Genetic studies of mental illnesses date back at least a century ago, beginning with descriptive studies based on Mendelian laws of inheritance. A variety of study designs including twin studies, family studies, linkage analysis, and more recently, genomewide association studies have been employed to study the genetics of mental illnesses, or complex diseases in general. In this paper, I will present the challenges and methods from a statistical perspective and focus on genetic association studies.
Discussion of "Statistical Modeling of Spatial Extremes" by A. C. Davison, S. A. Padoan and M. Ribatet [arXiv:1208.3378].
Recently, ultra high-throughput sequencing of RNA (RNA-Seq) has been developed as an approach for analysis of gene expression. By obtaining tens or even hundreds of millions of reads of transcribed sequences, an RNA-Seq experiment can offer a comprehensive survey of the population of genes (transcripts) in any sample of interest. This paper introduces a statistical model for estimating isoform abundance from RNA-Seq data and is flexible enough to accommodate both single end and paired end RNA-Seq data and sampling bias along the length of the transcript. Based on the derivation of minimal sufficient statistics for the model, a computationally feasible implementation of the maximum likelihood estimator of the model is provided. Further, it is shown that using paired end RNA-Seq provides more accurate isoform abundance estimates than single end sequencing at fixed sequencing depth. Simulation studies are also given.
The Dempster--Shafer (DS) theory is a powerful tool for probabilistic reasoning based on a formal calculus for combining evidence. DS theory has been widely used in computer science and engineering applications, but has yet to reach the statistical mainstream, perhaps because the DS belief functions do not satisfy long-run frequency properties. Recently, two of the authors proposed an extension of DS, called the weak belief (WB) approach, that can incorporate desirable frequency properties into the DS framework by systematically enlarging the focal elements. The present paper reviews and extends this WB approach. We present a general description of WB in the context of inferential models, its interplay with the DS calculus, and the maximal belief solution. New applications of the WB method in two high-dimensional hypothesis testing problems are given. Simulations show that the WB procedures, suitably calibrated, perform well compared to popular classical methods. Most importantly, the WB approach combines the probabilistic reasoning of DS with the desirable frequency properties of classical statistics.
Howard Raiffa earned his bachelor's degree in mathematics, his master's degree in statistics and his Ph.D. in mathematics at the University of Michigan. Since 1957, Raiffa has been a member of the faculty at Harvard University, where he is now the Frank P. Ramsey Chair in Managerial Economics (Emeritus) in the Graduate School of Business Administration and the Kennedy School of Government. A pioneer in the creation of the field known as decision analysis, his research interests span statistical decision theory, game theory, behavioral decision theory, risk analysis and negotiation analysis. Raiffa has supervised more than 90 doctoral dissertations and written 11 books. His new book is Negotiation Analysis: The Science and Art of Collaborative Decision Making. Another book, Smart Choices, co-authored with his former doctoral students John Hammond and Ralph Keeney, was the CPR (formerly known as the Center for Public Resources) Institute for Dispute Resolution Book of the Year in 1998. Raiffa helped to create the International Institute for Applied Systems Analysis and he later became its first Director, serving in that capacity from 1972 to 1975. His many honors and awards include t