共找到 20 条结果
Statistics has moved beyond the frequentist-Bayesian controversies of the past. Where does this leave our ability to interpret results? I suggest that a philosophy compatible with statistical practice, labeled here statistical pragmatism, serves as a foundation for inference. Statistical pragmatism is inclusive and emphasizes the assumptions that connect statistical models with observed data. I argue that introductory courses often mischaracterize the process of statistical inference and I propose an alternative "big picture" depiction.
Statistical graphics and data visualization have long histories, but their modern forms began only in the early 1800s. Between roughly 1850 and 1900 ($\pm10$), an explosive growth occurred in both the general use of graphic methods and the range of topics to which they were applied. Innovations were prodigious and some of the most exquisite graphics ever produced appeared, resulting in what may be called the ``Golden Age of Statistical Graphics.'' In this article I trace the origins of this period in terms of the infrastructure required to produce this explosive growth: recognition of the importance of systematic data collection by the state; the rise of statistical theory and statistical thinking; enabling developments of technology; and inventions of novel methods to portray statistical data. To illustrate, I describe some specific contributions that give rise to the appellation ``Golden Age.''
Cartogram drawing is a technique for showing geography-related statistical information, such as demographic and epidemiological data. The idea is to distort a map by resizing its regions according to a statistical parameter by keeping the map recognizable. This article describes an R package implementing an algorithm called RecMap which approximates every map region by a rectangle where the area corresponds to the given statistical value (maintain zero cartographic error). The package implements the computationally intensive tasks in C++. This paper's contribution is that it demonstrates on real and synthetic maps how recmap can be used, how it is implemented and used with other statistical packages.
When Lenz proposed a simple model for phase transitions in magnetism, he couldn't have imagined that the "Ising model" was to become a jewel in field of equilibrium statistical mechanics. Its role spans the spectrum, from a good pedagogical example to a universality class in critical phenomena. A quarter century ago, Katz, Lebowitz and Spohn found a similar treasure. By introducing a seemingly trivial modification to the Ising lattice gas, they took it into the vast realms of non-equilibrium statistical mechanics. An abundant variety of unexpected behavior emerged and caught many of us by surprise. We present a brief review of some of the new insights garnered and some of the outstanding puzzles, as well as speculate on the model's role in the future of non-equilibrium statistical physics.
Statistical comparisons of electoral variables are made between groups of electronic voting machines and voting centers classified by types of transmissions according to the volume of traffic in incoming and outgoing data of machines from and toward the National Electoral Council (CNE) totalizing servers. One unexpectedly finds two types of behavior in wire telephony data transmissions and only one type where cellular telephony is employed, contravening any reasonable electoral normative. Differentiation in data transmissions arise when comparing number of incoming and outgoing data bytes per machine against total number of votes per machine reported officially by the CNE. The respective distributions of electoral variables for each type of transmission show that the groups classified by it do not correspond to random sets of the electoral universe. In particular, the distributions for the NO percentage of votes per machine differ statistically across groups. The presidential elections of 1998, 2000 and the 2004 Presidential Recall Referendum (2004 PRR) are compared according to the type of transmissions in 2004 PRR. Statistically, the difference between the empirical distributions
Visual insights into a wide variety of statistical methods, for both didactic and data analytic purposes, can often be achieved through geometric diagrams and geometrically based statistical graphs. This paper extols and illustrates the virtues of the ellipse and her higher-dimensional cousins for both these purposes in a variety of contexts, including linear models, multivariate linear models and mixed-effect models. We emphasize the strong relationships among statistical methods, matrix-algebraic solutions and geometry that can often be easily understood in terms of ellipses.
In a quantum universe with a strong arrow of time, it is standard to postulate that the initial wave function started in a particular macrostate--the special low-entropy macrostate selected by the Past Hypothesis. Moreover, there is an additional postulate about statistical mechanical probabilities according to which the initial wave function is a "typical" choice in the macrostate (the Statistical Postulate). Together, they support a probabilistic version of the Second Law of Thermodynamics: typical initial wave functions will increase in entropy. Hence, there are two sources of randomness in such a universe: the quantum-mechanical probabilities of the Born rule and the statistical mechanical probabilities of the Statistical Postulate. I propose a new way to understand time's arrow in a quantum universe. It is based on what I call the Thermodynamic Theories of Quantum Mechanics. According to this perspective, there is a natural choice for the initial quantum state of the universe, which is given by not a wave function but by a density matrix. The initial density matrix of the universe is exactly the (normalized) projection operator onto the Past Hypothesis subspace (of the Hilbert
Modern statistical inference tasks often require iterative optimization methods to compute the solution. Convergence analysis from an optimization viewpoint only informs us how well the solution is approximated numerically but overlooks the sampling nature of the data. In contrast, recognizing the randomness in the data, statisticians are keen to provide uncertainty quantification, or confidence, for the solution obtained using iterative optimization methods. This paper makes progress along this direction by introducing the moment-adjusted stochastic gradient descents, a new stochastic optimization method for statistical inference. We establish non-asymptotic theory that characterizes the statistical distribution for certain iterative methods with optimization guarantees. On the statistical front, the theory allows for model mis-specification, with very mild conditions on the data. For optimization, the theory is flexible for both convex and non-convex cases. Remarkably, the moment-adjusting idea motivated from "error standardization" in statistics achieves a similar effect as acceleration in first-order optimization methods used to fit generalized linear models. We also demonstrat
This is an invited contribution to the discussion on Professor Deborah Mayo's paper, "On the Birnbaum argument for the strong likelihood principle," to appear in Statistical Science. Mayo clearly demonstrates that statistical methods violating the likelihood principle need not violate either the sufficiency or conditionality principle, thus refuting Birnbaum's claim. With the constraints of Birnbaum's theorem lifted, we revisit the foundations of statistical inference, focusing on some new foundational principles, the inferential model framework, and connections with sufficiency and conditioning. [arXiv:1302.7021]
This special issue is a product of the First Interdisciplinary Symposium on Statistical Challenges and Opportunities in Electronic Commerce Research, which took place on May 22--23, 2005, at the Robert H. Smith School of Business, University of Maryland, College Park (\url{www.smith.umd.edu/dit/statschallenges/}). The symposium brought together, for the first time, researchers from statistics, information systems, and related fields, all of whom work or are interested in empirical research related to electronic commerce. The goal of the symposium was to cross the borders, discuss joint research opportunities, expose this field and its statistical challenges, and promote collaboration between the different fields.
We consider Feller mean-reverting square-root diffusion, which has been applied to model a wide variety of processes with linearly state-dependent diffusion, such as stochastic volatility and interest rates in finance, and neuronal and populations dynamics in natural sciences. We focus on the statistical mixing (or superstatistical) process in which the parameter related to the mean value can fluctuate - a plausible mechanism for the emergence of heavy-tailed distributions. We obtain analytical results for the associated probability density function (both stationary and time dependent), its correlation structure and aggregation properties. Our results are applied to explain the statistics of stock traded volume at different aggregation scales.
The areal modeling of the extremes of a natural process such as rainfall or temperature is important in environmental statistics; for example, understanding extreme areal rainfall is crucial in flood protection. This article reviews recent progress in the statistical modeling of spatial extremes, starting with sketches of the necessary elements of extreme value statistics and geostatistics. The main types of statistical models thus far proposed, based on latent variables, on copulas and on spatial max-stable processes, are described and then are compared by application to a data set on rainfall in Switzerland. Whereas latent variable modeling allows a better fit to marginal distributions, it fits the joint distributions of extremes poorly, so appropriately-chosen copula or max-stable models seem essential for successful spatial modeling of extremes.
Hospital profiling involves a comparison of a health care provider's structure, processes of care, or outcomes to a standard, often in the form of a report card. Given the ubiquity of report cards and similar consumer ratings in contemporary American culture, it is notable that these are a relatively recent phenomenon in health care. Prior to the 1986 release of Medicare hospital outcome data, little such information was publicly available. We review the historical evolution of hospital profiling with special emphasis on outcomes; present a detailed history of cardiac surgery report cards, the paradigm for modern provider profiling; discuss the potential unintended negative consequences of public report cards; and describe various statistical methodologies for quantifying the relative performance of cardiac surgery programs. Outstanding statistical issues are also described.
Identifying the risk factors for mental illnesses is of significant public health importance. Diagnosis, stigma associated with mental illnesses, comorbidity, and complex etiologies, among others, make it very challenging to study mental disorders. Genetic studies of mental illnesses date back at least a century ago, beginning with descriptive studies based on Mendelian laws of inheritance. A variety of study designs including twin studies, family studies, linkage analysis, and more recently, genomewide association studies have been employed to study the genetics of mental illnesses, or complex diseases in general. In this paper, I will present the challenges and methods from a statistical perspective and focus on genetic association studies.
In 1866 Gregor Mendel published a seminal paper containing the foundations of modern genetics. In 1936 Ronald Fisher published a statistical analysis of Mendel's data concluding that "the data of most, if not all, of the experiments have been falsified so as to agree closely with Mendel's expectations." The accusation gave rise to a controversy which has reached the present time. There are reasonable grounds to assume that a certain unconscious bias was systematically introduced in Mendel's experimentation. Based on this assumption, a probability model that fits Mendel's data and does not offend Fisher's analysis is given. This reconciliation model may well be the end of the Mendel--Fisher controversy.
Howard Raiffa earned his bachelor's degree in mathematics, his master's degree in statistics and his Ph.D. in mathematics at the University of Michigan. Since 1957, Raiffa has been a member of the faculty at Harvard University, where he is now the Frank P. Ramsey Chair in Managerial Economics (Emeritus) in the Graduate School of Business Administration and the Kennedy School of Government. A pioneer in the creation of the field known as decision analysis, his research interests span statistical decision theory, game theory, behavioral decision theory, risk analysis and negotiation analysis. Raiffa has supervised more than 90 doctoral dissertations and written 11 books. His new book is Negotiation Analysis: The Science and Art of Collaborative Decision Making. Another book, Smart Choices, co-authored with his former doctoral students John Hammond and Ralph Keeney, was the CPR (formerly known as the Center for Public Resources) Institute for Dispute Resolution Book of the Year in 1998. Raiffa helped to create the International Institute for Applied Systems Analysis and he later became its first Director, serving in that capacity from 1972 to 1975. His many honors and awards include t
Rejoinder to "Statistical Modeling of Spatial Extremes" by A. C. Davison, S. A. Padoan and M. Ribatet [arXiv:1208.3378].
This paper discusses different needs and approaches to establishing ``causation'' that are relevant in legal cases involving statistical input based on epidemiological (or more generally observational or population-based) information. We distinguish between three versions of ``cause'': the first involves negligence in providing or allowing exposure, the second involves ``cause'' as it is shown through a scientifically proved increased risk of an outcome from the exposure in a population, and the third considers ``cause'' as it might apply to an individual plaintiff based on the first two. The population-oriented ``cause'' is that commonly addressed by statisticians, and we propose a variation on the Bradford Hill approach to testing such causality in an observational framework, and discuss how such a systematic series of tests might be considered in a legal context. We review some current legal approaches to using probabilistic statements, and link these with the scientific methodology as developed here. In particular, we provide an approach both to the idea of individual outcomes being caused on a balance of probabilities, and to the idea of material contribution to such outcomes.
Comment on "Quantifying the Fraction of Missing Information for Hypothesis Testing in Statistical and Genetic Studies" [arXiv:1102.2774]
Rejoinder to "Quantifying the Fraction of Missing Information for Hypothesis Testing in Statistical and Genetic Studies" [arXiv:1102.2774]