TY - JOUR
T1 - Which Test Is Best
T2 - Evaluation of Traditional and Contemporary Statistical Tests for Analysis of Spherical Equivalent Prediction Error
AU - Cannon, Nathan T.
AU - Savini, Giacomo
AU - Pantanelli, Seth M.
AU - Hoffer, Kenneth
AU - Aristodemou, Petros
AU - Riaz, Kamran
AU - Murphy, David
AU - Griffin, David
AU - Berry, Christian
AU - Debellemanière, Guillaume
AU - Gauvin, Mathieu
AU - Wallerstein, Avi
AU - Whang, Woong Joo
AU - Koh, Kyungmin
AU - Negishi, Kazuno
AU - Hayashi, Ken
AU - Hipólito-Fernandes, Diogo
AU - Cooke, David L.
N1 - Publisher Copyright:
© 2025 Elsevier Inc.
PY - 2025/5
Y1 - 2025/5
N2 - Purpose: To characterize the performance of traditional and contemporary statistics tests for analysis of spherical equivalent prediction error (SEQ-PE) after cataract surgery, with regard to test significance and self-consistency. Design: Comparison of the utility of statistical tests. Methods: Subjects: Eyes from 5 academic centers and 2 private practices that had cataract surgery and postoperative manifest refraction between March 2011 and December 2022. SEQ-PE data were randomly divided into subsets with sample sizes of 100, 300, 500, 700, and 2600 eyes. Mean absolute error (MAE), median absolute error (MedAE), SD, root mean squared absolute error (RMSAE), and the proportion of eyes within 0.50 diopters (D) of predicted were calculated for 6 power prediction formulas and analyzed using Friedman post hoc Dunn, Cochran Q post hoc McNemar, Eyetemis, and Wilcox-Holladay-Wang-Koch (WHWK) statistical tests. All tests were corrected for multiple comparisons using the Holm correction. Main outcome measures: The percentage of significant relationships (Percent Significance), proportion of inconsistencies (Inconsistency Ratio), and proportion of self-consistent significant relationships (Significance Index) for each statistical test. Results: Analysis was performed on 7839 eyes of 7839 patients. WHWK.MAE (42%), WHWK.SD (41%), Eyetemis.MAE (40%), WHWK.RMSAE (39%), and Dunn.MAE (34%) were more robust, respectively, than the remaining 3 tests by Percent Significance (all P <.001). Dunn.MAE had the best Inconsistency Ratio (0.11) in the 100-eye subsets. The same top 5 tests were most robust by Significance Index (0.39, 0.35, 0.35, 0.34, and 0.31, respectively; all P <.02). WHWK.SD and WHWK.RMSAE had the best Significance Indices (both 0.77) in the 2600-eye subsets. McNemar had the poorest Significance Index overall (0.09). Conclusions: The 5 high-performing tests produced significant results more often and were also self-consistent. WHWK.MAE and McNemar were highest and lowest performing overall, respectively. Dunn.MAE may be useful in sample sizes <150 eyes.
AB - Purpose: To characterize the performance of traditional and contemporary statistics tests for analysis of spherical equivalent prediction error (SEQ-PE) after cataract surgery, with regard to test significance and self-consistency. Design: Comparison of the utility of statistical tests. Methods: Subjects: Eyes from 5 academic centers and 2 private practices that had cataract surgery and postoperative manifest refraction between March 2011 and December 2022. SEQ-PE data were randomly divided into subsets with sample sizes of 100, 300, 500, 700, and 2600 eyes. Mean absolute error (MAE), median absolute error (MedAE), SD, root mean squared absolute error (RMSAE), and the proportion of eyes within 0.50 diopters (D) of predicted were calculated for 6 power prediction formulas and analyzed using Friedman post hoc Dunn, Cochran Q post hoc McNemar, Eyetemis, and Wilcox-Holladay-Wang-Koch (WHWK) statistical tests. All tests were corrected for multiple comparisons using the Holm correction. Main outcome measures: The percentage of significant relationships (Percent Significance), proportion of inconsistencies (Inconsistency Ratio), and proportion of self-consistent significant relationships (Significance Index) for each statistical test. Results: Analysis was performed on 7839 eyes of 7839 patients. WHWK.MAE (42%), WHWK.SD (41%), Eyetemis.MAE (40%), WHWK.RMSAE (39%), and Dunn.MAE (34%) were more robust, respectively, than the remaining 3 tests by Percent Significance (all P <.001). Dunn.MAE had the best Inconsistency Ratio (0.11) in the 100-eye subsets. The same top 5 tests were most robust by Significance Index (0.39, 0.35, 0.35, 0.34, and 0.31, respectively; all P <.02). WHWK.SD and WHWK.RMSAE had the best Significance Indices (both 0.77) in the 2600-eye subsets. McNemar had the poorest Significance Index overall (0.09). Conclusions: The 5 high-performing tests produced significant results more often and were also self-consistent. WHWK.MAE and McNemar were highest and lowest performing overall, respectively. Dunn.MAE may be useful in sample sizes <150 eyes.
UR - http://www.scopus.com/inward/record.url?scp=85218911463&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85218911463&partnerID=8YFLogxK
U2 - 10.1016/j.ajo.2025.01.022
DO - 10.1016/j.ajo.2025.01.022
M3 - Article
C2 - 39922476
AN - SCOPUS:85218911463
SN - 0002-9394
VL - 273
SP - 33
EP - 42
JO - American Journal of Ophthalmology
JF - American Journal of Ophthalmology
ER -