共找到 20 条结果
Part 1 Probability and Random Variables 1 The Meaning of Probability 2 The Axioms of Probability 3 Repeated Trials 4 The Concept of a Random Variable 5 Functions of One Random Variable 6 Two Random Variables 7 Sequences of Random Variables 8 Statistics Part 2 Stochastic Processes 9 General Concepts 10 Random Walk and Other Applications 11 Spectral Representation 12 Spectral Estimation 13 Mean Square Estimation 14 Entropy 15 Markov Chains 16 Markov Processes and Queueing Theory
Presents a discussion of matching, randomization, random sampling, and other methods of controlling extraneous variation. The objective was to specify the benefits of randomization in estimating causal effects of treatments. It is concluded that randomization should be employed whenever possible but that the use of carefully controlled nonrandomized data to estimate causal effects is a reasonable and necessary procedure in many cases.
Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes
Modern computer-intensive statistical methods play a key role in solving many problems across a wide range of scientific disciplines. This new edition of the bestselling Randomization, Bootstrap and Monte Carlo Methods in Biology illustrates the value of a number of these methods with an emphasis on biological applications. This textbook focuses on three related areas in computational statistics: randomization, bootstrapping, and Monte Carlo methods of inference. The author emphasizes the sampling approach within randomization testing and confidence intervals. Similar to randomization, the book shows how bootstrapping, or resampling, can be used for confidence intervals and tests of significance. It also explores how to use Monte Carlo methods to test hypotheses and construct confidence intervals.New to the Third EditionUpdated information on regression and time series analysis, multivariate methods, survival and growth data as well as software for computational statisticsReferences that reflect recent developments in methodology and computing techniquesAdditional references on new applications of computer-intensive methods in biologyProviding comprehensive coverage of computer-intensive applications while also offering data sets online, Randomization, Bootstrap and Monte Carlo Methods in Biology, Third Edition supplies a solid foundation for the ever-expanding field of statistics and quantitative analysis in biology.
This paper presents the first randomized approach to kinodynamic planning (also known as trajectory planning or trajectory design). The task is to determine control inputs to drive a robot from an ini ial configuration and velocity to a goal configuration and velocity while obeying physically based dynamical models and avoiding obstacles in the robot’s environment. The authors consider generic systems that express the nonlinear dynamics of a robot in terms of the robot’s high-dimensional configuration space. Kinodynamic planning is treated as a motion-planning problem in a higher dimensional state space that has both first-order differential constraints and obstacle-based global constraints. The state space serves the same role as the configuration space for basic path planning; however, standard randomized path-planning techniques do not directly apply to planning trajectories in the state space. The authors have developed a randomized planning approach that is particularly tailored to trajectory planning problems in high-dimensional state spaces. The basis for this approach is the construction of rapidly exploring random trees, which offer benefits that are similar to those obtained by successful randomized holonomic planning methods but apply to a much broader class of problems. Theoretical analysis of the algorithm is given. Experimental results are presented for an implementation that computes trajectories for hovercrafts and satellites in cluttered environments, resulting in state spaces of up to 12 dimensions.
We present Conditional Random Fields, a framework \nfor building probabilistic models to segment \nand label sequence data. Conditional random \nfields offer several advantages over hidden \nMarkov models and stochastic grammars \nfor such tasks, including the ability to relax \nstrong independence assumptions made in those \nmodels. Conditional random fields also avoid \na fundamental limitation of maximum entropy \nMarkov models (MEMMs) and other discriminative \nMarkov models based on directed graphical \nmodels, which can be biased towards states \nwith few successor states. We present iterative \nparameter estimation algorithms for conditional \nrandom fields and compare the performance of \nthe resulting models to HMMs and MEMMs on \nsynthetic and natural-language data.
BACKGROUND: The number of Mendelian randomization analyses including large numbers of genetic variants is rapidly increasing. This is due to the proliferation of genome-wide association studies, and the desire to obtain more precise estimates of causal effects. However, some genetic variants may not be valid instrumental variables, in particular due to them having more than one proximal phenotypic correlate (pleiotropy). METHODS: We view Mendelian randomization with multiple instruments as a meta-analysis, and show that bias caused by pleiotropy can be regarded as analogous to small study bias. Causal estimates using each instrument can be displayed visually by a funnel plot to assess potential asymmetry. Egger regression, a tool to detect small study bias in meta-analysis, can be adapted to test for bias from pleiotropy, and the slope coefficient from Egger regression provides an estimate of the causal effect. Under the assumption that the association of each genetic variant with the exposure is independent of the pleiotropic effect of the variant (not via the exposure), Egger's test gives a valid test of the null causal hypothesis and a consistent causal effect estimate even when all the genetic variants are invalid instrumental variables. RESULTS: We illustrate the use of this approach by re-analysing two published Mendelian randomization studies of the causal effect of height on lung function, and the causal effect of blood pressure on coronary artery disease risk. The conservative nature of this approach is illustrated with these examples. CONCLUSIONS: An adaption of Egger regression (which we call MR-Egger) can detect some violations of the standard instrumental variable assumptions, and provide an effect estimate which is not subject to these violations. The approach provides a sensitivity analysis for the robustness of the findings from a Mendelian randomization investigation.
Developments in genome-wide association studies and the increasing availability of summary genetic association data have made application of Mendelian randomization relatively straightforward. However, obtaining reliable results from a Mendelian randomization investigation remains problematic, as the conventional inverse-variance weighted method only gives consistent estimates if all of the genetic variants in the analysis are valid instrumental variables. We present a novel weighted median estimator for combining data on multiple genetic variants into a single causal estimate. This estimator is consistent even when up to 50% of the information comes from invalid instrumental variables. In a simulation analysis, it is shown to have better finite-sample Type 1 error rates than the inverse-variance weighted method, and is complementary to the recently proposed MR-Egger (Mendelian randomization-Egger) regression method. In analyses of the causal effects of low-density lipoprotein cholesterol and high-density lipoprotein cholesterol on coronary artery disease risk, the inverse-variance weighted method suggests a causal effect of both lipid fractions, whereas the weighted median and MR-Egger regression methods suggest a null effect of high-density lipoprotein cholesterol that corresponds with the experimental evidence. Both median-based and MR-Egger regression methods should be considered as sensitivity analyses for Mendelian randomization investigations with multiple genetic variants.
BACKGROUND: Because of specific methodological difficulties in conducting randomized trials, surgical research remains dependent predominantly on observational or non-randomized studies. Few validated instruments are available to determine the methodological quality of such studies either from the reader's perspective or for the purpose of meta-analysis. The aim of the present study was to develop and validate such an instrument. METHODS: After an initial conceptualization phase of a methodological index for non-randomized studies (MINORS), a list of 12 potential items was sent to 100 experts from different surgical specialties for evaluation and was also assessed by 10 clinical methodologists. Subsequent testing involved the assessment of inter-reviewer agreement, test-retest reliability at 2 months, internal consistency reliability and external validity. RESULTS: The final version of MINORS contained 12 items, the first eight being specifically for non-comparative studies. Reliability was established on the basis of good inter-reviewer agreement, high test-retest reliability by the kappa-coefficient and good internal consistency by a high Cronbach's alpha-coefficient. External validity was established in terms of the ability of MINORS to identify excellent trials. CONCLUSIONS: MINORS is a valid instrument designed to assess the methodological quality of non-randomized surgical studies, whether comparative or non-comparative. The next step will be to determine its external validity when used in a large number of studies and to compare it with other existing instruments.
Associations between modifiable exposures and disease seen in observational epidemiology are sometimes confounded and thus misleading, despite our best efforts to improve the design and analysis of studies. Mendelian randomization-the random assortment of genes from parents to offspring that occurs during gamete formation and conception-provides one method for assessing the causal nature of some environmental exposures. The association between a disease and a polymorphism that mimics the biological link between a proposed exposure and disease is not generally susceptible to the reverse causation or confounding that may distort interpretations of conventional observational studies. Several examples where the phenotypic effects of polymorphisms are well documented provide encouraging evidence of the explanatory power of Mendelian randomization and are described. The limitations of the approach include confounding by polymorphisms in linkage disequilibrium with the polymorphism under study, that polymorphisms may have several phenotypic effects associated with disease, the lack of suitable polymorphisms for studying modifiable exposures of interest, and canalization-the buffering of the effects of genetic variation during development. Nevertheless, Mendelian randomization provides new opportunities to test causality and demonstrates how investment in the human genome project may contribute to understanding and preventing the adverse effects on human health of modifiable exposures.
Observational epidemiological studies suffer from many potential biases, from confounding and from reverse causation, and this limits their ability to robustly identify causal associations. Several high-profile situations exist in which randomized controlled trials of precisely the same intervention that has been examined in observational studies have produced markedly different findings. In other observational sciences, the use of instrumental variable (IV) approaches has been one approach to strengthening causal inferences in non-experimental situations. The use of germline genetic variants that proxy for environmentally modifiable exposures as instruments for these exposures is one form of IV analysis that can be implemented within observational epidemiological studies. The method has been referred to as 'Mendelian randomization', and can be considered as analogous to randomized controlled trials. This paper outlines Mendelian randomization, draws parallels with IV methods, provides examples of implementation of the approach and discusses limitations of the approach and some methods for dealing with these.
This paper presents a simple model for such processes as spin diffusion or conduction in the "impurity band." These processes involve transport in a lattice which is in some sense random, and in them diffusion is expected to take place via quantum jumps between localized sites. In this simple model the essential randomness is introduced by requiring the energy to vary randomly from site to site. It is shown that at low enough densities no diffusion at all can take place, and the criteria for transport to occur are given.
<para xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> Suppose we are given a vector <emphasis><formula formulatype="inline"> <tex>$f$</tex></formula></emphasis> in a class <emphasis><formula formulatype="inline"> <tex>${\cal F} \subset{\BBR}^N$</tex></formula></emphasis>, e.g., a class of digital signals or digital images. How many linear measurements do we need to make about <emphasis><formula formulatype="inline"><tex>$f$</tex></formula></emphasis> to be able to recover <emphasis><formula formulatype="inline"><tex>$f$</tex> </formula></emphasis> to within precision <emphasis><formula formulatype="inline"> <tex>$\epsilon$</tex></formula></emphasis> in the Euclidean <emphasis><formula formulatype="inline"><tex>$(\ell_2)$</tex></formula></emphasis> metric? This paper shows that if the objects of interest are sparse in a fixed basis or compressible, then it is possible to reconstruct <emphasis><formula formulatype="inline"> <tex>$f$</tex></formula></emphasis> to within very high accuracy from a small number of random measurements by solving a simple linear program. More precisely, suppose that the <emphasis><formula formulatype="inline"><tex>$n$</tex></formula></emphasis>th largest entry of the vector <emphasis><formula formulatype="inline"><tex>$\vert f\vert$</tex></formula></emphasis> (or of its coefficients in a fixed basis) obeys <emphasis><formula formulatype="inline"><tex>$\vert f\vert _{(n)} \le R \cdot n^{-1/p}$</tex></formula></emphasis>, where <emphasis><formula formulatype="inline"> <tex>$R > 0$</tex></formula></emphasis> and <emphasis><formula formulatype="inline"> <tex>$p > 0$</tex></formula></emphasis>. Suppose that we take measurements <emphasis><formula formulatype="inline"><tex>$y_k = \langle f, X_k\rangle, k = 1, \ldots, K$</tex> </formula></emphasis>, where the <emphasis><formula formulatype="inline"> <tex>$X_k$</tex></formula></emphasis> are <emphasis><formula formulatype="inline"> <tex>$N$</tex></formula></emphasis>-dimensional Gaussian vectors with independent standard normal entries. Then for each <emphasis><formula formulatype="inline"> <tex>$f$</tex></formula></emphasis> obeying the decay estimate above for some <emphasis><formula formulatype="inline"><tex>$0 < p < 1$</tex></formula></emphasis> and with overwhelming probability, our reconstruction <emphasis><formula formulatype="inline"> <tex>$f^\sharp$</tex></formula></emphasis>, defined as the solution to the constraints <emphasis><formula formulatype="inline"><tex>$y_k = \langle f^\sharp, X_k \rangle$</tex></formula></emphasis> with minimal <emphasis><formula formulatype="inline"> <tex>$\ell_1$</tex></formula></emphasis> norm, obeys <emphasis> <formula formulatype="display"><tex>$$ \Vert f - f^\sharp\Vert _{\ell_2} \le C_p \cdot R \cdot (K/\log N)^{-r}, \quad r = 1/p - 1/2. $$</tex> </formula></emphasis>There is a sense in which this result is optimal; it is generally impossible to obtain a higher accuracy from any set of <emphasis><formula formulatype="inline"><tex>$K$</tex></formula></emphasis> measurements whatsoever. The methodology extends to various other random measurement ensembles; for example, we show that similar results hold if one observes a few randomly sampled Fourier coefficients of <emphasis><formula formulatype="inline"><tex>$f$</tex> </formula></emphasis>. In fact, the results are quite general and require only two hypotheses on the measurement ensemble which are detailed. </para>
Genome-wide association studies, which typically report regression coefficients summarizing the associations of many genetic variants with various traits, are potentially a powerful source of data for Mendelian randomization investigations. We demonstrate how such coefficients from multiple variants can be combined in a Mendelian randomization analysis to estimate the causal effect of a risk factor on an outcome. The bias and efficiency of estimates based on summarized data are compared to those based on individual-level data in simulation studies. We investigate the impact of gene-gene interactions, linkage disequilibrium, and 'weak instruments' on these estimates. Both an inverse-variance weighted average of variant-specific associations and a likelihood-based approach for summarized data give similar estimates and precision to the two-stage least squares method for individual-level data, even when there are gene-gene interactions. However, these summarized data methods overstate precision when variants are in linkage disequilibrium. If the P-value in a linear regression of the risk factor for each variant is less than 1×10⁻⁵, then weak instrument bias will be small. We use these methods to estimate the causal association of low-density lipoprotein cholesterol (LDL-C) on coronary artery disease using published data on five genetic variants. A 30% reduction in LDL-C is estimated to reduce coronary artery disease risk by 67% (95% CI: 54% to 76%). We conclude that Mendelian randomization investigations using summarized data from uncorrelated variants are similarly efficient to those using individual-level data, although the necessary assumptions cannot be so fully assessed.
BACKGROUND AND PURPOSE: Assessment of the quality of randomized controlled trials (RCTs) is common practice in systematic reviews. However, the reliability of data obtained with most quality assessment scales has not been established. This report describes 2 studies designed to investigate the reliability of data obtained with the Physiotherapy Evidence Database (PEDro) scale developed to rate the quality of RCTs evaluating physical therapist interventions. METHOD: In the first study, 11 raters independently rated 25 RCTs randomly selected from the PEDro database. In the second study, 2 raters rated 120 RCTs randomly selected from the PEDro database, and disagreements were resolved by a third rater; this generated a set of individual rater and consensus ratings. The process was repeated by independent raters to create a second set of individual and consensus ratings. Reliability of ratings of PEDro scale items was calculated using multirater kappas, and reliability of the total (summed) score was calculated using intraclass correlation coefficients (ICC [1,1]). RESULTS: The kappa value for each of the 11 items ranged from.36 to.80 for individual assessors and from.50 to.79 for consensus ratings generated by groups of 2 or 3 raters. The ICC for the total score was.56 (95% confidence interval=.47-.65) for ratings by individuals, and the ICC for consensus ratings was.68 (95% confidence interval=.57-.76). DISCUSSION AND CONCLUSION: The reliability of ratings of PEDro scale items varied from "fair" to "substantial," and the reliability of the total PEDro score was "fair" to "good."
We argue that the random oracle model—where all parties have access to a public random oracle—provides a bridge between cryptographic theory and cryptographic practice. In the paradigm we suggest, a practical protocol P is produced by first devising and proving correct a protocol PR for the random oracle model, and then replacing oracle accesses by the computation of an “appropriately chosen” function h. This paradigm yields protocols much more efficient than standard ones while retaining many of the advantages of provable security. We illustrate these gains for problems including encryption, signatures, and zero-knowledge proofs.
Models for the analysis of longitudinal data must recognize the relationship between serial observations on the same unit. Multivariate models with general covariance structure are often difficult to apply to highly unbalanced data, whereas two-stage random-effects models can be used easily. In two-stage models, the probability distributions for the response vectors of different individuals belong to a single family, but some random-effects parameters vary across individuals, with a distribution specified at the second stage. A general family of models is discussed, which includes both growth models and repeated-measures models as special cases. A unified approach to fitting these models, based on a combination of empirical Bayes and maximum likelihood estimation of model parameters and using the EM algorithm, is discussed. Two examples are taken from a current epidemiological study of the health effects of air pollution.
Much of previous attention on decision trees focuses on the splitting criteria and optimization of tree sizes. The dilemma between overfitting and achieving maximum accuracy is seldom resolved. A method to construct a decision tree based classifier is proposed that maintains highest accuracy on training data and improves on generalization accuracy as it grows in complexity. The classifier consists of multiple trees constructed systematically by pseudorandomly selecting subsets of components of the feature vector, that is, trees constructed in randomly chosen subspaces. The subspace method is compared to single-tree classifiers and other forest construction methods by experiments on publicly available datasets, where the method's superiority is demonstrated. We also discuss independence between trees in a forest and relate that to the combined classification accuracy.
BACKGROUND: In patients with acute ischemic stroke caused by a proximal intracranial arterial occlusion, intraarterial treatment is highly effective for emergency revascularization. However, proof of a beneficial effect on functional outcome is lacking. METHODS: We randomly assigned eligible patients to either intraarterial treatment plus usual care or usual care alone. Eligible patients had a proximal arterial occlusion in the anterior cerebral circulation that was confirmed on vessel imaging and that could be treated intraarterially within 6 hours after symptom onset. The primary outcome was the modified Rankin scale score at 90 days; this categorical scale measures functional outcome, with scores ranging from 0 (no symptoms) to 6 (death). The treatment effect was estimated with ordinal logistic regression as a common odds ratio, adjusted for prespecified prognostic factors. The adjusted common odds ratio measured the likelihood that intraarterial treatment would lead to lower modified Rankin scores, as compared with usual care alone (shift analysis). RESULTS: We enrolled 500 patients at 16 medical centers in The Netherlands (233 assigned to intraarterial treatment and 267 to usual care alone). The mean age was 65 years (range, 23 to 96), and 445 patients (89.0%) were treated with intravenous alteplase before randomization. Retrievable stents were used in 190 of the 233 patients (81.5%) assigned to intraarterial treatment. The adjusted common odds ratio was 1.67 (95% confidence interval [CI], 1.21 to 2.30). There was an absolute difference of 13.5 percentage points (95% CI, 5.9 to 21.2) in the rate of functional independence (modified Rankin score, 0 to 2) in favor of the intervention (32.6% vs. 19.1%). There were no significant differences in mortality or the occurrence of symptomatic intracerebral hemorrhage. CONCLUSIONS: In patients with acute ischemic stroke caused by a proximal intracranial occlusion of the anterior circulation, intraarterial treatment administered within 6 hours after stroke onset was effective and safe. (Funded by the Dutch Heart Foundation and others; MR CLEAN Netherlands Trial Registry number, NTR1804, and Current Controlled Trials number, ISRCTN10888758.).
From the Publisher: A revised and expanded edition of this classic reference/text, covering the latest techniques for the analysis and measurement of stationary and nonstationary random data passing through physical systems. With more than 100,000 copies in print and six foreign translations, the first edition standardized the methodology in this field. This new edition covers all new procedures developed since 1971 and extends the application of random data analysis to aerospace and automotive research; digital data analysis; dynamic test programs; fluid turbulence analysis; industrial noise control; oceanographic data analysis; system identification problems; and many other fields. Includes new formulas for statistical error analysis of desired estimates, new examples and problem sets.