Abstract The second version of the NCEP Climate Forecast System (CFSv2) was made operational at NCEP in March 2011. This version has upgrades to nearly all aspects of the data assimilation and forecast model components of the system. A coupled reanalysis was made over a 32-yr period (1979–2010), which provided the initial conditions to carry out a comprehensive reforecast over 29 years (1982–2010). This was done to obtain consistent and stable calibrations, as well as skill estimates for the operational subseasonal and seasonal predictions at NCEP with CFSv2. The operational implementation of the full system ensures a continuity of the climate record and provides a valuable up-to-date dataset to study many aspects of predictability on the seasonal and subseasonal scales. Evaluation of the reforecasts show that the CFSv2 increases the length of skillful MJO forecasts from 6 to 17 days (dramatically improving subseasonal forecasts), nearly doubles the skill of seasonal forecasts of 2-m temperatures over the United States, and significantly improves global SST forecasts over its predecessor. The CFSv2 not only provides greatly improved guidance at these time scales but also creates many more products for subseasonal and seasonal forecasting with an extensive set of retrospective forecasts for users to calibrate their forecast products. These retrospective and real-time operational forecasts will be used by a wide community of users in their decision making processes in areas such as water management for rivers and agriculture, transportation, energy use by utilities, wind and other sustainable energy, and seasonal prediction of the hurricane season.
Abstract Time series methods are used to make long-run forecasts, with confidence intervals, of age-specific mortality in the United States from 1990 to 2065. First, the logs of the age-specific death rates are modeled as a linear function of an unobserved period-specific intensity index, with parameters depending on age. This model is fit to the matrix of U.S. death rates, 1933 to 1987, using the singular value decomposition (SVD) method; it accounts for almost all the variance over time in age-specific death rates as a group. Whereas e 0 has risen at a decreasing rate over the century and has decreasing variability, k(t) declines at a roughly constant rate and has roughly constant variability, facilitating forecasting. k(t), which indexes the intensity of mortality, is next modeled as a time series (specifically, a random walk with drift) and forecast. The method performs very well on within-sample forecasts, and the forecasts are insensitive to reductions in the length of the base period from 90 to 30 years; some instability appears for base periods of 10 or 20 years, however. Forecasts of age-specific rates are derived from the forecasts of k, and other life table variables are derived and presented. These imply an increase of 10.5 years in life expectancy to 86.05 in 2065 (sexes combined), with a confidence band of plus 3.9 or minus 5.6 years, including uncertainty concerning the estimated trend. Whereas 46% now survive to age 80, by 2065 46% will survive to age 90. Of the gains forecast for person-years lived over the life cycle from now until 2065, 74% will occur at age 65 and over. These life expectancy forecasts are substantially lower than direct time series forecasts of e 0, and have far narrower confidence bands; however, they are substantially higher than the forecasts of the Social Security Administration's Office of the Actuary.
Abstract Ensembles used for probabilistic weather forecasting often exhibit a spread-error correlation, but they tend to be underdispersive. This paper proposes a statistical method for postprocessing ensembles based on Bayesian model averaging (BMA), which is a standard method for combining predictive distributions from different sources. The BMA predictive probability density function (PDF) of any quantity of interest is a weighted average of PDFs centered on the individual bias-corrected forecasts, where the weights are equal to posterior probabilities of the models generating the forecasts and reflect the models' relative contributions to predictive skill over the training period. The BMA weights can be used to assess the usefulness of ensemble members, and this can be used as a basis for selecting ensemble members; this can be useful given the cost of running large ensembles. The BMA PDF can be represented as an unweighted ensemble of any desired size, by simulating from the BMA predictive distribution. The BMA predictive variance can be decomposed into two components, one corresponding to the between-forecast variability, and the second to the within-forecast variability. Predictive PDFs or intervals based solely on the ensemble spread incorporate the first component but not the second. Thus BMA provides a theoretical explanation of the tendency of ensembles to exhibit a spread-error correlation but yet be underdispersive. The method was applied to 48-h forecasts of surface temperature in the Pacific Northwest in January–June 2000 using the University of Washington fifth-generation Pennsylvania State University–NCAR Mesoscale Model (MM5) ensemble. The predictive PDFs were much better calibrated than the raw ensemble, and the BMA forecasts were sharp in that 90% BMA prediction intervals were 66% shorter on average than those produced by sample climatology. As a by-product, BMA yields a deterministic point forecast, and this had root-mean-square errors 7% lower than the best of the ensemble members and 8% lower than the ensemble mean. Similar results were obtained for forecasts of sea level pressure. Simulation experiments show that BMA performs reasonably well when the underlying ensemble is calibrated, or even overdispersed.
This paper provides a general framework for integration of high-frequency intraday data into the measurement, modeling, and forecasting of daily and lower frequency volatility and return distributions. Most procedures for modeling and forecasting financial asset return volatilities, correlations, and distributions rely on restrictive and complicated parametric multivariate ARCH or stochastic volatility models, which often perform poorly at intraday frequencies. Use of realized volatility constructed from high-frequency intraday returns, in contrast, permits the use of traditional time series procedures for modeling and forecasting. Building on the theory of continuous-time arbitrage-free price processes and the theory of quadratic variation, we formally develop the links between the conditional covariance matrix and the concept of realized volatility. Next, using continuously recorded observations for the Deutschemark / Dollar and Yen / Dollar spot exchange rates covering more than a decade, we find that forecasts from a simple long-memory Gaussian vector autoregression for the logarithmic daily realized volatilities perform admirably compared to popular daily ARCH and related models. Moreover, the vector autoregressive volatility forecast, coupled with a parametric lognormal-normal mixture distribution implied by the theoretically and empirically grounded assumption of normally distributed standardized returns, gives rise to well-calibrated density forecasts of future returns, and correspondingly accurate quantile estimates. Our results hold promise for practical modeling and forecasting of the large covariance matrices relevant in asset pricing, asset allocation and financial risk management applications.
Two separate sets of forecasts of airline passenger data have been combined to form a composite set of forecasts. The main conclusion is that the composite set of forecasts can yield lower mean-square error than either of the original forecasts. Past errors of each of the original forecasts are used to determine the weights to attach to these two original forecasts in forming the combined forecasts, and different methods of deriving these weights are examined.
BACKGROUND: Given the projected trends in population ageing and population growth, the number of people with dementia is expected to increase. In addition, strong evidence has emerged supporting the importance of potentially modifiable risk factors for dementia. Characterising the distribution and magnitude of anticipated growth is crucial for public health planning and resource prioritisation. This study aimed to improve on previous forecasts of dementia prevalence by producing country-level estimates and incorporating information on selected risk factors. METHODS: We forecasted the prevalence of dementia attributable to the three dementia risk factors included in the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2019 (high body-mass index, high fasting plasma glucose, and smoking) from 2019 to 2050, using relative risks and forecasted risk factor prevalence to predict GBD risk-attributable prevalence in 2050 globally and by world region and country. Using linear regression models with education included as an additional predictor, we then forecasted the prevalence of dementia not attributable to GBD risks. To assess the relative contribution of future trends in GBD risk factors, education, population growth, and population ageing, we did a decomposition analysis. FINDINGS: We estimated that the number of people with dementia would increase from 57·4 (95% uncertainty interval 50·4-65·1) million cases globally in 2019 to 152·8 (130·8-175·9) million cases in 2050. Despite large increases in the projected number of people living with dementia, age-standardised both-sex prevalence remained stable between 2019 and 2050 (global percentage change of 0·1% [-7·5 to 10·8]). We estimated that there were more women with dementia than men with dementia globally in 2019 (female-to-male ratio of 1·69 [1·64-1·73]), and we expect this pattern to continue to 2050 (female-to-male ratio of 1·67 [1·52-1·85]). There was geographical heterogeneity in the projected increases across countries and regions, with the smallest percentage changes in the number of projected dementia cases in high-income Asia Pacific (53% [41-67]) and western Europe (74% [58-90]), and the largest in north Africa and the Middle East (367% [329-403]) and eastern sub-Saharan Africa (357% [323-395]). Projected increases in cases could largely be attributed to population growth and population ageing, although their relative importance varied by world region, with population growth contributing most to the increases in sub-Saharan Africa and population ageing contributing most to the increases in east Asia. INTERPRETATION: Growth in the number of individuals living with dementia underscores the need for public health planning efforts and policy to address the needs of this group. Country-level estimates can be used to inform national planning efforts and decisions. Multifaceted approaches, including scaling up interventions to address modifiable risk factors and investing in research on biological mechanisms, will be key in addressing the expected increases in the number of individuals affected by dementia. FUNDING: Bill & Melinda Gates Foundation and Gates Ventures.
Forecasting is a common data science task that helps organizations with capacity planning, goal setting, and anomaly detection. Despite its importance, there are serious challenges associated with producing reliable and high-quality forecasts—especially when there are a variety of time series and analysts with expertise in time series modeling are relatively rare. To address these challenges, we describe a practical approach to forecasting “at scale” that combines configurable models with analyst-in-the-loop performance analysis. We propose a modular regression model with interpretable parameters that can be intuitively adjusted by analysts with domain knowledge about the time series. We describe performance analyses to compare and evaluate forecasting procedures, and automatically flag forecasts for manual review and adjustment. Tools that help analysts to use their expertise most effectively enable reliable, practical forecasting of business time series.
Automatic forecasts of large numbers of univariate time series are often needed in business and other contexts. We describe two automatic forecasting algorithms that have been implemented in the forecast package for R. The first is based on innovations state space models that underly exponential smoothing methods. The second is a step-wise algorithm for forecasting with ARIMA models. The algorithms are applicable to both seasonal and non-seasonal data, and are compared and illustrated using four real time series. We also briefly describe some of the other functionality available in the forecast package.
Recently, there has been a surge of Transformer-based solutions for the long-term time series forecasting (LTSF) task. Despite the growing performance over the past few years, we question the validity of this line of research in this work. Specifically, Transformers is arguably the most successful solution to extract the semantic correlations among the elements in a long sequence. However, in time series modeling, we are to extract the temporal relations in an ordered set of continuous points. While employing positional encoding and using tokens to embed sub-series in Transformers facilitate preserving some ordering information, the nature of the permutation-invariant self-attention mechanism inevitably results in temporal information loss. To validate our claim, we introduce a set of embarrassingly simple one-layer linear models named LTSF-Linear for comparison. Experimental results on nine real-life datasets show that LTSF-Linear surprisingly outperforms existing sophisticated Transformer-based LTSF models in all cases, and often by a large margin. Moreover, we conduct comprehensive empirical studies to explore the impacts of various design elements of LTSF models on their temporal relation extraction capability. We hope this surprising finding opens up new research directions for the LTSF task. We also advocate revisiting the validity of Transformer-based solutions for other time series analysis tasks (e.g., anomaly detection) in the future.
As the power system is facing a transition toward a more intelligent, flexible, and interactive system with higher penetration of renewable energy generation, load forecasting, especially short-term load forecasting for individual electric customers plays an increasingly essential role in the future grid planning and operation. Other than aggregated residential load in a large scale, forecasting an electric load of a single energy user is fairly challenging due to the high volatility and uncertainty involved. In this paper, we propose a long short-term memory (LSTM) recurrent neural network-based framework, which is the latest and one of the most popular techniques of deep learning, to tackle this tricky issue. The proposed framework is tested on a publicly available set of real residential smart meter data, of which the performance is comprehensively compared to various benchmarks including the state-of-the-arts in the field of load forecasting. As a result, the proposed LSTM approach outperforms the other listed rival algorithms in the task of short-term load forecasting for individual residential households.
The growing use of computers for mechanized inventory control and production planning has brought with it the need for explicit forecasts of sales and usage for individual products and materials. These forecasts must be made on a routine basis for thousands of products, so that they must be made quickly, and, both in terms of computing time and information storage, cheaply; they should be responsive to changing conditions. The paper presents a method of forecasting sales which has these desirable characteristics, and which in terms of ability to forecast compares favorably with other, more traditional methods. Several models of the exponential forecasting system are presented, along with several examples of application.
The NCEP Climate Forecast System Reanalysis (CFSR) was completed for the 31-yr period from 1979 to 2009, in January 2010. The CFSR was designed and executed as a global, high-resolution coupled atmosphere–ocean–land surface–sea ice system to provide the best estimate of the state of these coupled domains over this period. The current CFSR will be extended as an operational, real-time product into the future. New features of the CFSR include 1) coupling of the atmosphere and ocean during the generation of the 6-h guess field, 2) an interactive sea ice model, and 3) assimilation of satellite radiances by the Gridpoint Statistical Interpolation (GSI) scheme over the entire period. The CFSR global atmosphere resolution is ~38 km (T382) with 64 levels extending from the surface to 0.26 hPa. The global ocean's latitudinal spacing is 0.25° at the equator, extending to a global 0.5° beyond the tropics, with 40 levels to a depth of 4737 m. The global land surface model has four soil levels and the global sea ice model has three layers. The CFSR atmospheric model has observed variations in carbon dioxide (CO2) over the 1979–2009 period, together with changes in aerosols and other trace gases and solar variations. Most available in situ and satellite observations were included in the CFSR. Satellite observations were used in radiance form, rather than retrieved values, and were bias corrected with “spin up” runs at full resolution, taking into account variable CO2 concentrations. This procedure enabled the smooth transitions of the climate record resulting from evolutionary changes in the satellite observing system. CFSR atmospheric, oceanic, and land surface output products are available at an hourly time resolution and a horizontal resolution of 0.5° latitude × 0.5° longitude. The CFSR data will be distributed by the National Climatic Data Center (NCDC) and NCAR. This reanalysis will serve many purposes, including providing the basis for most of the NCEP Climate Prediction Center's operational climate products by defining the mean states of the atmosphere, ocean, land surface, and sea ice over the next 30-yr climate normal (1981–2010); providing initial conditions for historical forecasts that are required to calibrate operational NCEP climate forecasts (from week 2 to 9 months); and providing estimates and diagnoses of the Earth's climate state over the satellite data period for community climate research. Preliminary analysis of the CFSR output indicates a product that is far superior in most respects to the reanalysis of the mid-1990s. The previous NCEP–NCAR reanalyses have been among the most used NCEP products in history; there is every reason to believe the CFSR will supersede these older products both in scope and quality, because it is higher in time and space resolution, covers the atmosphere, ocean, sea ice, and land, and was executed in a coupled mode with a more modern data assimilation system and forecast model.
This article considers forecasting a single time series when there are many predictors (N) and time series observations (T). When the data follow an approximate factor model, the predictors can be summarized by a small number of indexes, which we estimate using principal components. Feasible forecasts are shown to be asymptotically efficient in the sense that the difference between the feasible forecasts and the infeasible forecasts constructed using the actual values of the factors converges in probability to 0 as both N and T grow large. The estimated factors are shown to be consistent, even in the presence of time variation in the factor model.
This article studies forecasting a macroeconomic time series variable using a large number of predictors. The predictors are summarized using a small number of indexes constructed by principal component analysis. An approximate dynamic factor model serves as the statistical framework for the estimation of the indexes and construction of the forecasts. The method is used to construct 6-, 12-, and 24-monthahead forecasts for eight monthly U.S. macroeconomic time series using 215 predictors in simulated real time from 1970 through 1998. During this sample period these new forecasts outperformed univariate autoregressions, small vector autoregressions, and leading indicator models.
Load forecasting has become one of the major areas of research in electrical engineering, and most traditional forecasting models and artificial intelligence techniques have been tried out in this task. Artificial neural networks (NNs) have lately received much attention, and a great number of papers have reported successful experiments and practical tests with them. Nevertheless, some authors remain skeptical, and believe that the advantages of using NNs in forecasting have not been systematically proved yet. In order to investigate the reasons for such skepticism, this review examines a collection of papers (published between 1991 and 1999) that report the application of NNs to short-term load forecasting. Our aim is to help to clarify the issue, by critically evaluating the ways in which the NNs proposed in these papers were designed and tested.
Summary Probabilistic forecasts of continuous variables take the form of predictive densities or predictive cumulative distribution functions. We propose a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive distributions subject to calibration. Calibration refers to the statistical consistency between the distributional forecasts and the observations and is a joint property of the predictions and the events that materialize. Sharpness refers to the concentration of the predictive distributions and is a property of the forecasts only. A simple theoretical framework allows us to distinguish between probabilistic calibration, exceedance calibration and marginal calibration. We propose and study tools for checking calibration and sharpness, among them the probability integral transform histogram, marginal calibration plots, the sharpness diagram and proper scoring rules. The diagnostic approach is illustrated by an assessment and ranking of probabilistic forecasts of wind speed at the Stateline wind energy centre in the US Pacific Northwest. In combination with cross-validation or in the time series context, our proposal provides very general, nonparametric alternatives to the use of information criteria for model diagnostics and model selection.
We examine the accuracy and contribution of the Merton distance to default (DD) model, which is based on Merton's (1974) bond pricing model. We compare the model to a "naïve" alternative, which uses the functional form suggested by the Merton model but does not solve the model for an implied probability of default. We find that the naïve predictor performs slightly better in hazard models and in out-of-sample forecasts than both the Merton DD model and a reduced-form model that uses the same inputs. Several other forecasting variables are also important predictors, and fitted values from an expanded hazard model outperform Merton DD default probabilities out of sample. Implied default probabilities from credit default swaps and corporate bond yield spreads are only weakly correlated with Merton DD probabilities after adjusting for agency ratings and bond characteristics. We conclude that while the Merton DD model does not produce a sufficient statistic for the probability of default, its functional form is useful for forecasting defaults. The Author 2008. Published by Oxford University Press on behalf of The Society for Financial Studies. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org, Oxford University Press.
A new sequential data assimilation method is discussed. It is based on forecasting the error statistics using Monte Carlo methods, a better alternative than solving the traditional and computationally extremely demanding approximate error covariance equation used in the extended Kalman filter. The unbounded error growth found in the extended Kalman filter, which is caused by an overly simplified closure in the error covariance equation, is completely eliminated. Open boundaries can be handled as long as the ocean model is well posed. Well‐known numerical instabilities associated with the error covariance equation are avoided because storage and evolution of the error covariance matrix itself are not needed. The results are also better than what is provided by the extended Kalman filter since there is no closure problem and the quality of the forecast error statistics therefore improves. The method should be feasible also for more sophisticated primitive equation models. The computational load for reasonable accuracy is only a fraction of what is required for the extended Kalman filter and is given by the storage of, say, 100 model states for an ensemble size of 100 and thus CPU requirements of the order of the cost of 100 model integrations. The proposed method can therefore be used with realistic nonlinear ocean models on large domains on existing computers, and it is also well suited for parallel computers and clusters of workstations where each processor integrates a few members of the ensemble.
First course in Econometrics in Economics Departments at better schools, also Economic/Business Forecasting. Statistics prerequisite but no calculus. Slightly higher level and more comprehensive than Gujarati (M-H, 1996) . P-R covers more time series and forecasting. P-R coverage is notch below Johnston-DiNardo (M-H, 97) and requires no matrix algebra. Includes data disk.
Urban land-cover change threatens biodiversity and affects ecosystem productivity through loss of habitat, biomass, and carbon storage. However, despite projections that world urban populations will increase to nearly 5 billion by 2030, little is known about future locations, magnitudes, and rates of urban expansion. Here we develop spatially explicit probabilistic forecasts of global urban land-cover change and explore the direct impacts on biodiversity hotspots and tropical carbon biomass. If current trends in population density continue and all areas with high probabilities of urban expansion undergo change, then by 2030, urban land cover will increase by 1.2 million km(2), nearly tripling the global urban land area circa 2000. This increase would result in considerable loss of habitats in key biodiversity hotspots, with the highest rates of forecasted urban growth to take place in regions that were relatively undisturbed by urban development in 2000: the Eastern Afromontane, the Guinean Forests of West Africa, and the Western Ghats and Sri Lanka hotspots. Within the pan-tropics, loss in vegetation biomass from areas with high probability of urban expansion is estimated to be 1.38 PgC (0.05 PgC yr(-1)), equal to ∼5% of emissions from tropical deforestation and land-use change. Although urbanization is often considered a local issue, the aggregate global impacts of projected urban expansion will require significant policy changes to affect future growth trajectories to minimize global biodiversity and vegetation carbon losses.