TY - JOUR
T1 - Chance-Corrected Interrater Agreement Statistics for Two-Rater Dichotomous Responses
T2 - A Method Review With Comparative Assessment Under Possibly Correlated Decisions
AU - Tian, Zizhong
AU - Chinchilli, Vernon M.
AU - Shen, Chan
AU - Zhou, Shouhao
N1 - Publisher Copyright:
© 2025 The Author(s). International Statistical Review published by John Wiley & Sons Ltd on behalf of International Statistical Institute.
PY - 2025
Y1 - 2025
N2 - Measurement of the interrater agreement (IRA) is critical for assessing the reliability and validity of ratings in various disciplines. While numerous IRA statistics have been developed, there is a lack of guidance on selecting appropriate measures especially when raters' decisions could be correlated. To address this gap, we review a family of chance-corrected IRA statistics for two-rater dichotomous-response cases, a fundamental setting that not only serves as the theoretical foundation for categorical-response or multirater IRA methods but is also practically dominant in most empirical studies, and we propose a novel data-generating framework to simulate correlated decision processes between raters. Subsequently, a new estimand, which calibrates the ‘true’ chance-corrected IRA, is introduced while accounting for the potential ‘probabilistic certainty’. Extensive simulations were conducted to evaluate the performance of the reviewed IRA methods under various practical scenarios and were summarised by an agglomerative hierarchical clustering analysis. Finally, we provide recommendations for selecting appropriate IRA statistics based on outcome prevalence and rater characteristics and highlight the need for further advancements in IRA estimation methodologies.
AB - Measurement of the interrater agreement (IRA) is critical for assessing the reliability and validity of ratings in various disciplines. While numerous IRA statistics have been developed, there is a lack of guidance on selecting appropriate measures especially when raters' decisions could be correlated. To address this gap, we review a family of chance-corrected IRA statistics for two-rater dichotomous-response cases, a fundamental setting that not only serves as the theoretical foundation for categorical-response or multirater IRA methods but is also practically dominant in most empirical studies, and we propose a novel data-generating framework to simulate correlated decision processes between raters. Subsequently, a new estimand, which calibrates the ‘true’ chance-corrected IRA, is introduced while accounting for the potential ‘probabilistic certainty’. Extensive simulations were conducted to evaluate the performance of the reviewed IRA methods under various practical scenarios and were summarised by an agglomerative hierarchical clustering analysis. Finally, we provide recommendations for selecting appropriate IRA statistics based on outcome prevalence and rater characteristics and highlight the need for further advancements in IRA estimation methodologies.
UR - http://www.scopus.com/inward/record.url?scp=85214096979&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85214096979&partnerID=8YFLogxK
U2 - 10.1111/insr.12606
DO - 10.1111/insr.12606
M3 - Comment/debate
AN - SCOPUS:85214096979
SN - 0306-7734
JO - International Statistical Review
JF - International Statistical Review
ER -