The effects of imputing missing data on ensemble temperature forecasts

Tyler C. McCandless, Sue Ellen Haupt, George S. Young

Research output: Contribution to journalArticlepeer-review

14 Scopus citations


A major issue for developing post-processing methods for NWP forecasting systems is the need to obtain complete training datasets. Without a complete dataset, it can become difficult, if not impossible, to train and verify statistical post-processing techniques, including ensemble consensus forecasting schemes. In addition, when ensemble forecast data are missing, the real-time use of the consensus forecast weighting scheme becomes difficult and the quality of uncertainty information derived from the ensemble is reduced. To ameliorate these problems, an analysis of the treatment of missing data in ensemble model temperature forecasts is performed to determine which method of replacing the missing data produces the lowest Mean Absolute Error (MAE) of consensus forecasts while preserving the ensemble calibration. This study explores several methods of replacing missing data, including ones based on persistence, a Fourier fit to capture seasonal variability, ensemble member mean substitution, three day mean deviation, and an Artificial Neural Network (ANN). The analysis is performed on 48-hour temperature forecasts for ten locations in the Pacific Northwest. The methods are evaluated according to their effect on the forecast performance of two ensemble post-processing forecasting methods, specifically an equal-weight consensus forecast and a ten day performance-weighted window. The methods are also assessed using rank histograms to determine if they preserve the calibration of the ensembles. For both postprocessing techniques all imputation methods, with the exception of the ensemble mean substitution, produce mean absolute errors not significantly different from the cases when all ensemble members are available. However, the three day mean deviation and ANN have rank histograms similar to that for the baseline of the non-imputed cases (i.e. the ensembles are appropriately calibrated) for all locations, while persistence, ensemble mean, and Fourier substitution do not consistently produce appropriately calibrated ensembles. The three day mean deviation has the advantage of being computationally efficient in a real-time forecasting environment.

Original languageEnglish (US)
Pages (from-to)162-171
Number of pages10
JournalJournal of Computers
Issue number2
StatePublished - 2011

All Science Journal Classification (ASJC) codes

  • General Computer Science


Dive into the research topics of 'The effects of imputing missing data on ensemble temperature forecasts'. Together they form a unique fingerprint.

Cite this