Collaborative filtering or recommender systems use a database about user preferences to predict additional topics or products a new user might like. In this paper we describe several algorithms designed for this task, including techniques based on correlation coefficients, vector-based similarity calculations, and statistical Bayesian methods. We compare the predictive accuracy of the various methods in a set of representative problem domains. We use two basic classes of evaluation metrics. The first characterizes accuracy over a set of individual predictions in terms of average absolute deviation. The second estimates the utility of a ranked list of suggested items. This metric uses an estimate of the probability that a user will see a recommendation in an ordered list. Experiments were run for datasets associated with 3 application areas, 4 experimental protocols, and the 2 evaluation metrics for the various algorithms. Results indicate that for a wide range of conditions, Bayesian networks with decision trees at each node and correlation methods outperform Bayesian-clustering and vector-similarity methods. Between correlation and Bayesian networks, the preferred method depends on the nature of the dataset, nature of the application (ranked versus one-by-one presentation), and the availability of votes with which to make predictions. Other considerations include the size of database, speed of predictions, and learning time.
While supervised learning has enabled great progress in many applications, unsupervised learning has not seen such widespread adoption, and remains an important and challenging endeavor for artificial intelligence. In this work, we propose a universal unsupervised learning approach to extract useful representations from high-dimensional data, which we call Contrastive Predictive Coding. The key insight of our model is to learn such representations by predicting the future in latent space by using powerful autoregressive models. We use a probabilistic contrastive loss which induces the latent space to capture information that is maximally useful to predict future samples. It also makes the model tractable by using negative sampling. While most prior work has focused on evaluating representations for a particular modality, we demonstrate that our approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.
A new twin study suggests your genes may play a bigger role in your future success than your upbringing。 Researchers found that IQ, which is largely genetically influenced, strongly predicts education, career, and income。 Even twins raised in the same household diverged based on genetic differences
The effects of lesions, receptor blocking, electrical self-stimulation, and drugs of abuse suggest that midbrain dopamine systems are involved in processing reward information and learning approach behavior. Most dopamine neurons show phasic activations after primary liquid and food rewards and conditioned, reward-predicting visual and auditory stimuli. They show biphasic, activation-depression responses after stimuli that resemble reward-predicting stimuli or are novel or particularly salient. However, only few phasic activations follow aversive stimuli. Thus dopamine neurons label environmental stimuli with appetitive value, predict and detect rewards and signal alerting and motivating events. By failing to discriminate between different rewards, dopamine neurons appear to emit an alerting message about the surprising presence or absence of rewards. All responses to rewards and reward-predicting stimuli depend on event predictability. Dopamine neurons are activated by rewarding events that are better than predicted, remain uninfluenced by events that are as good as predicted, and are depressed by events that are worse than predicted. By signaling rewards according to a prediction error, dopamine responses have the formal characteristics of a teaching signal postulated by reinforcement learning theories. Dopamine responses transfer during learning from primary rewards to reward-predicting stimuli. This may contribute to neuronal mechanisms underlying the retrograde action of rewards, one of the main puzzles in reinforcement learning. The impulse response releases a short pulse of dopamine onto many dendrites, thus broadcasting a rather global reinforcement signal to postsynaptic neurons. This signal may improve approach behavior by providing advance reward information before the behavior occurs, and may contribute to learning by modifying synaptic transmission. The dopamine reward signal is supplemented by activity in neurons in striatum, frontal cortex, and amygdala, which process specific reward information but do not emit a global reward prediction error signal. A cooperation between the different reward signals may assure the use of specific rewards for selectively reinforcing behaviors. Among the other projection systems, noradrenaline neurons predominantly serve attentional mechanisms and nucleus basalis neurons code rewards heterogeneously. Cerebellar climbing fibers signal errors in motor performance or errors in the prediction of aversive events to cerebellar Purkinje cells. Most deficits following dopamine-depleting lesions are not easily explained by a defective reward signal but may reflect the absence of a general enabling function of tonic levels of extracellular dopamine. Thus dopamine systems may have two functions, the phasic transmission of reward information and the tonic enabling of postsynaptic neurons.
“It seems every other day I am reading a story about a massive insider trading scandal
UNLABELLED: PURPOSE To improve on current standards for breast cancer prognosis and prediction of chemotherapy benefit by developing a risk model that incorporates the gene expression-based "intrinsic" subtypes luminal A, luminal B, HER2-enriched, and basal-like. METHODS A 50-gene subtype predictor was developed using microarray and quantitative reverse transcriptase polymerase chain reaction data from 189 prototype samples. Test sets from 761 patients (no systemic therapy) were evaluated for prognosis, and 133 patients were evaluated for prediction of pathologic complete response (pCR) to a taxane and anthracycline regimen. RESULTS: The intrinsic subtypes as discrete entities showed prognostic significance (P = 2.26E-12) and remained significant in multivariable analyses that incorporated standard parameters (estrogen receptor status, histologic grade, tumor size, and node status). A prognostic model for node-negative breast cancer was built using intrinsic subtype and clinical information. The C-index estimate for the combined model (subtype and tumor size) was a significant improvement on either the clinicopathologic model or subtype model alone. The intrinsic subtype model predicted neoadjuvant chemotherapy efficacy with a negative predictive value for pCR of 97%. CONCLUSION Diagnosis by intrinsic subtype adds significant prognostic and predictive information to standard parameters for patients with breast cancer. The prognostic properties of the continuous risk score will be of value for the management of node-negative breast cancers. The subtypes and risk score can also be used to assess the likelihood of efficacy from neoadjuvant chemotherapy.
BACKGROUND: Although many patients with intermediate-grade or high-grade (aggressive) non-Hodgkin's lymphoma are cured by combination chemotherapy, the remainder are not cured and ultimately die of their disease. The Ann Arbor classification, used to determine the stage of this disease, does not consistently distinguish between patients with different long-term prognoses. This project was undertaken to develop a model for predicting outcome in patients with aggressive non-Hodgkin's lymphoma on the basis of the patients' clinical characteristics before treatment. METHODS: Adults with aggressive non-Hodgkin's lymphoma from 16 institutions and cooperative groups in the United States, Europe, and Canada who were treated between 1982 and 1987 with combination-chemotherapy regimens containing doxorubicin were evaluated for clinical features predictive of overall survival and relapse-free survival. Features that remained independently significant in step-down regression analyses of survival were incorporated into models that identified groups of patients of all ages and groups of patients no more than 60 years old with different risks of death. RESULTS: In 2031 patients of all ages, our model, based on age, tumor stage, serum lactate dehydrogenase concentration, performance status, and number of extranodal disease sites, identified four risk groups with predicted five-year survival rates of 73 percent, 51 percent, 43 percent, and 26 percent. In 1274 patients 60 or younger, an age-adjusted model based on tumor stage, lactate dehydrogenase level, and performance status identified four risk groups with predicted five-year survival rates of 83 percent, 69 percent, 46 percent, and 32 percent. In both models, the increased risk of death was due to both a lower rate of complete responses and a higher rate of relapse from complete response. These two indexes, called the international index and the age-adjusted international index, were significantly more accurate than the Ann Arbor classification in predicting long-term survival. CONCLUSIONS: The international index and the age-adjusted international index should be used in the design of future therapeutic trials in patients with aggressive non-Hodgkin's lymphoma and in the selection of appropriate therapeutic approaches for individual patients.
1. Introduction to Predictive Control. 2. A Standard Formulation of Predictive Control. 3. Solving Predictive Control Problems. 4. Step Response and Transfer Function Formulations. 5. Tuning. 6. Stability. 7. Robust Predictive Control. 8. Perspectives. 9. Case Studies. 10. The Model Predictive Control Toolbox. References Appendices A. Some Commercial MPC Products B. MATLAB Program basicmpc C. The MPC Toolbox D. Solutions to Problems
Statistical modeling is a powerful tool for developing and testing theories by way of causal explanation, prediction, and description. In many disciplines there is near-exclusive use of statistical modeling for causal explanation and the assumption that models with high explanatory power are inherently of high predictive power. Conflation between explanation and prediction is common, yet the distinction must be understood for progressing scientific knowledge. While this distinction has been recognized in the philosophy of science, the statistical literature lacks a thorough discussion of the many differences that arise in the process of modeling for an explanatory versus a predictive goal. The purpose of this article is to clarify the distinction between explanatory and predictive modeling, to discuss its sources, and to reveal the practical implications of the distinction to each step in the modeling process.
Abstract A new and relatively simple equation for the soil‐water content‐pressure head curve, θ( h ), is described in this paper. The particular form of the equation enables one to derive closed‐form analytical expressions for the relative hydraulic conductivity, K r , when substituted in the predictive conductivity models of N.T. Burdine or Y. Mualem. The resulting expressions for K r ( h ) contain three independent parameters which may be obtained by fitting the proposed soil‐water retention model to experimental data. Results obtained with the closed‐form analytical expressions based on the Mualem theory are compared with observed hydraulic conductivity data for five soils with a wide range of hydraulic properties. The unsaturated hydraulic conductivity is predicted well in four out of five cases. It is found that a reasonable description of the soil‐water retention curve at low water contents is important for an accurated prediction of the unsaturated hydraulic conductivity.
WoLF PSORT is an extension of the PSORT II program for protein subcellular location prediction. WoLF PSORT converts protein amino acid sequences into numerical localization features; based on sorting signals, amino acid composition and functional motifs such as DNA-binding motifs. After conversion, a simple k-nearest neighbor classifier is used for prediction. Using html, the evidence for each prediction is shown in two ways: (i) a list of proteins of known localization with the most similar localization features to the query, and (ii) tables with detailed information about individual localization features. For convenience, sequence alignments of the query to similar proteins and links to UniProt and Gene Ontology are provided. Taken together, this information allows a user to understand the evidence (or lack thereof) behind the predictions made for particular proteins. WoLF PSORT is available at wolfpsort.org.
The Universal Soil Loss Equation (USLE) enables planners to predict the average rate of soil erosion for each feasible alternative combination of crop system and management practices in association with a specified soil type, rainfall pattern, and topography. When these predicted losses are compared with given soil loss tolerances, they provide specific guidelines for effecting erosion control within specified limits. The equation groups the numerous interrelated physical and management parameters that influence erosion rate under six major factors whose site-specific values can be expressed numerically. A half century of erosion research in many states has supplied information from which at least approximate values of the USLE factors can be obtained for specified farm fields or other small erosion prone areas throughout the United States. Tables and charts presented in this handbook make this information available for field use. Significant limitations in the data are identified. The USLE is an erosion model designed to compute longtime average soil losses from sheet and rill erosion under specified conditions. It is also useful for construction sites and other non-agricultural conditons, but it does not predict deposition and does not compute sediment yields from gully, streambank, and streambed erosion
BACKGROUND: A more accurate means of prognostication in breast cancer will improve the selection of patients for adjuvant systemic therapy. METHODS: Using microarray analysis to evaluate our previously established 70-gene prognosis profile, we classified a series of 295 consecutive patients with primary breast carcinomas as having a gene-expression signature associated with either a poor prognosis or a good prognosis. All patients had stage I or II breast cancer and were younger than 53 years old; 151 had lymph-node-negative disease, and 144 had lymph-node-positive disease. We evaluated the predictive power of the prognosis profile using univariable and multivariable statistical analyses. RESULTS: Among the 295 patients, 180 had a poor-prognosis signature and 115 had a good-prognosis signature, and the mean (+/-SE) overall 10-year survival rates were 54.6+/-4.4 percent and 94.5+/-2.6 percent, respectively. At 10 years, the probability of remaining free of distant metastases was 50.6+/-4.5 percent in the group with a poor-prognosis signature and 85.2+/-4.3 percent in the group with a good-prognosis signature. The estimated hazard ratio for distant metastases in the group with a poor-prognosis signature, as compared with the group with the good-prognosis signature, was 5.1 (95 percent confidence interval, 2.9 to 9.0; P<0.001). This ratio remained significant when the groups were analyzed according to lymph-node status. Multivariable Cox regression analysis showed that the prognosis profile was a strong independent factor in predicting disease outcome. CONCLUSIONS: The gene-expression profile we studied is a more powerful predictor of the outcome of disease in young patients with breast cancer than standard systems based on clinical and histologic criteria.
BACKGROUND: The likelihood of distant recurrence in patients with breast cancer who have no involved lymph nodes and estrogen-receptor-positive tumors is poorly defined by clinical and histopathological measures. METHODS: We tested whether the results of a reverse-transcriptase-polymerase-chain-reaction (RT-PCR) assay of 21 prospectively selected genes in paraffin-embedded tumor tissue would correlate with the likelihood of distant recurrence in patients with node-negative, tamoxifen-treated breast cancer who were enrolled in the National Surgical Adjuvant Breast and Bowel Project clinical trial B-14. The levels of expression of 16 cancer-related genes and 5 reference genes were used in a prospectively defined algorithm to calculate a recurrence score and to determine a risk group (low, intermediate, or high) for each patient. RESULTS: Adequate RT-PCR profiles were obtained in 668 of 675 tumor blocks. The proportions of patients categorized as having a low, intermediate, or high risk by the RT-PCR assay were 51, 22, and 27 percent, respectively. The Kaplan-Meier estimates of the rates of distant recurrence at 10 years in the low-risk, intermediate-risk, and high-risk groups were 6.8 percent (95 percent confidence interval, 4.0 to 9.6), 14.3 percent (95 percent confidence interval, 8.3 to 20.3), and 30.5 percent (95 percent confidence interval, 23.6 to 37.4). The rate in the low-risk group was significantly lower than that in the high-risk group (P<0.001). In a multivariate Cox model, the recurrence score provided significant predictive power that was independent of age and tumor size (P<0.001). The recurrence score was also predictive of overall survival (P<0.001) and could be used as a continuous function to predict distant recurrence in individual patients. CONCLUSIONS: The recurrence score has been validated as quantifying the likelihood of distant recurrence in tamoxifen-treated patients with node-negative, estrogen-receptor-positive breast cancer.
CONTEXT: Patients who have atrial fibrillation (AF) have an increased risk of stroke, but their absolute rate of stroke depends on age and comorbid conditions. OBJECTIVE: To assess the predictive value of classification schemes that estimate stroke risk in patients with AF. DESIGN, SETTING, AND PATIENTS: Two existing classification schemes were combined into a new stroke-risk scheme, the CHADS( 2) index, and all 3 classification schemes were validated. The CHADS( 2) was formed by assigning 1 point each for the presence of congestive heart failure, hypertension, age 75 years or older, and diabetes mellitus and by assigning 2 points for history of stroke or transient ischemic attack. Data from peer review organizations representing 7 states were used to assemble a National Registry of AF (NRAF) consisting of 1733 Medicare beneficiaries aged 65 to 95 years who had nonrheumatic AF and were not prescribed warfarin at hospital discharge. MAIN OUTCOME MEASURE: Hospitalization for ischemic stroke, determined by Medicare claims data. RESULTS: During 2121 patient-years of follow-up, 94 patients were readmitted to the hospital for ischemic stroke (stroke rate, 4.4 per 100 patient-years). As indicated by a c statistic greater than 0.5, the 2 existing classification schemes predicted stroke better than chance: c of 0.68 (95% confidence interval [CI], 0.65-0.71) for the scheme developed by the Atrial Fibrillation Investigators (AFI) and c of 0.74 (95% CI, 0.71-0.76) for the Stroke Prevention in Atrial Fibrillation (SPAF) III scheme. However, with a c statistic of 0.82 (95% CI, 0.80-0.84), the CHADS( 2) index was the most accurate predictor of stroke. The stroke rate per 100 patient-years without antithrombotic therapy increased by a factor of 1.5 (95% CI, 1.3-1.7) for each 1-point increase in the CHADS( 2) score: 1.9 (95% CI, 1.2-3.0) for a score of 0; 2.8 (95% CI, 2.0-3.8) for 1; 4.0 (95% CI, 3.1-5.1) for 2; 5.9 (95% CI, 4.6-7.3) for 3; 8.5 (95% CI, 6.3-11.1) for 4; 12.5 (95% CI, 8.2-17.5) for 5; and 18.2 (95% CI, 10.5-27.4) for 6. CONCLUSION: The 2 existing classification schemes and especially a new stroke risk index, CHADS( 2), can quantify risk of stroke for patients who have AF and may aid in selection of antithrombotic therapy.
We describe a new computer program, SnpEff, for rapidly categorizing the effects of variants in genome sequences. Once a genome is sequenced, SnpEff annotates variants based on their genomic locations and predicts coding effects. Annotated genomic locations include intronic, untranslated region, upstream, downstream, splice site, or intergenic regions. Coding effects such as synonymous or non-synonymous amino acid replacement, start codon gains or losses, stop codon gains or losses, or frame shifts can be predicted. Here the use of SnpEff is illustrated by annotating ~356,660 candidate SNPs in ~117 Mb unique sequences, representing a substitution rate of ~1/305 nucleotides, between the Drosophila melanogaster w(1118); iso-2; iso-3 strain and the reference y(1); cn(1) bw(1) sp(1) strain. We show that ~15,842 SNPs are synonymous and ~4,467 SNPs are non-synonymous (N/S ~0.28). The remaining SNPs are in other categories, such as stop codon gains (38 SNPs), stop codon losses (8 SNPs), and start codon gains (297 SNPs) in the 5'UTR. We found, as expected, that the SNP frequency is proportional to the recombination frequency (i.e., highest in the middle of chromosome arms). We also found that start-gain or stop-lost SNPs in Drosophila melanogaster often result in additions of N-terminal or C-terminal amino acids that are conserved in other Drosophila species. It appears that the 5' and 3' UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus. As genome sequencing is becoming inexpensive and routine, SnpEff enables rapid analyses of whole-genome sequencing data to be performed by an individual laboratory.
The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.
This book describes the reasoned action approach, an integrative framework for the prediction and change of human social behavior. It provides an up-to-date review of relevant research, discusses critical issues related to the reasoned action framework, and provides methodological and conceptual tools for the prediction and explanation of social behavior and for designing behavior change interventions.
In a breakthrough experiment, scientists directly imaged how particles pair up in a system that mimics superconductors。 Instead of behaving independently, the pairs moved in a synchronized, dance-like pattern—something never predicted before。 This suggests a major gap in the classic theory of superconductivity