Corrigendum to “Performance of an open machine learning model to classify sleep/wake from actigraphy across ∼24-hour intervals without knowledge of rest timing” [Sleep Health 9 (2023) 596-610, (S2352721823001341), (10.1016/j.sleh.2023.07.001)]

Daniel M. Roberts, Margeaux M. Schade, Lindsay Master, Vasant G. Honavar, Nicole G. Nahmod, Anne Marie Chang, Daniel Gartenberg, Orfeu M. Buxton

Research output: Contribution to journalComment/debatepeer-review


The authors regret a labeling error affecting 12 of the 220 days of data used within the main analysis: 10 days within the “Deep Sleeping” dataset, and 2 days within the “EcoSleep” dataset. Specifically, a daylight savings time change that occurred following actigraphy data collection start but before participation in the staged days led to incorrect time sync between the actigraphy and sleep staging on those 12 days. Although a minority of records, this mislabeling of ground truth likely impaired model training, and also decreased the reported performance for all classification types reported in the original manuscript, which all referenced the same ground truth labels. This error has been corrected in the current manuscript. In comparison to the earlier proof, the overall conclusions remain, though some statistical comparisons between the classifiers on the various metrics have crossed the .05 significance threshold, either becoming statistically significant, or no longer reaching statistical significance. Specifically, within the main manuscript, NPV for the ∼24-hour interval no longer reaches significance, while PPV for the in-bed interval does now reach significance (Table 6). Bias in SE for the in-bed interval no longer reaches significance, while MAE in SOL and CCC in SE for the in-bed interval do now reach significance (Table 7). Similarly, some statistical tests within the Supplementary Material have crossed the .05 threshold in either direction, though the overall conclusions remain the same. The authors would like to apologize for any inconvenience caused. [Formula presented][Formula presented] [Formula presented] For the computation of the updated results within the corrigendum, the Keras Tuner package was updated from version 1.1.0 to 1.1.3. Epoch-by-epoch performance is indicated within Table 6. At the ∼24-hour interval, the TCN model produces favorable epoch-by-epoch performance to the Oakley classifications on nearly all the metrics evaluated, excepting sensitivity and NPV which do not statistically differ. When restricting the performance evaluation to only the known in-bed interval, the TCN shows favorable epoch-by-epoch performance on accuracy, PPV, NPV, F1-score, Matthews correlation coefficient, and PABAK, while the remaining measures do not statistically differ between the two classifiers. Fig. 3 displays ROC curves for the TCN classifier, separately for ∼24-hour and in-bed evaluation. The ROC curve depicts the trade-off between the true positive rate (sensitivity) and the false positive rate (1 – specificity) as the probability threshold for classification is altered. The model performs more favorably across the ∼24-hour interval than the in-bed interval, also reflected numerically by the AUC values in Table 6. Table 7 displays confusion matrices for the performance of the Oakley and TCN classifiers, at both evaluation intervals. The confusion matrices reiterate the increased specificity for the TCN model at ∼24-hour evaluation that had been demonstrated statistically in Table 6. In addition, by summing within columns, the base rates of sleep and wake within each interval can be obtained. Discrepancy of classifiers within the ∼24-hour and in-bed intervals are indicated in Table 8 for bias and MAE, and Table 9 for CCC/CCCLON. Across the ∼24-hour interval, relative to the Oakley classifications, the amount of TST predicted by the TCN model more closely matches ground truth in terms of bias, MAE, and both variants of CCC. Scatterplots and Bland-Altman plots visualizing the relationship between true and predicted TST across the ∼24-hour interval are shown in Fig. 4. Within the in-bed interval, the classifiers have fewer differences in terms of reproducing ground truth. The Oakley classifier performs significantly better in terms of bias in SE or WASO, while the TCN classifier performs significantly better in terms of bias, MAE, and both variations of CCC in SOL, and CCC in SE. Scatterplots and Bland-Altman plots visualizing the relationship between true and predicted values for these metrics across the in-bed interval are shown in Fig. 5. The grant number U02AG060408 should instead read U2CAG060408.

Original languageEnglish (US)
Pages (from-to)255-260
Number of pages6
JournalSleep health
Issue number2
StateAccepted/In press - 2024

All Science Journal Classification (ASJC) codes

  • Health(social science)
  • Neuropsychology and Physiological Psychology
  • Social Sciences (miscellaneous)
  • Behavioral Neuroscience

Cite this