Chance-Corrected Interrater Agreement Statistics for Two-Rater Dichotomous Responses: A Method Review With Comparative Assessment Under Possibly Correlated Decisions
Zizhong Tian, Vernon M. Chinchilli, Chan Shen, Shouhao Zhou
{"title":"Chance-Corrected Interrater Agreement Statistics for Two-Rater Dichotomous Responses: A Method Review With Comparative Assessment Under Possibly Correlated Decisions","authors":"Zizhong Tian, Vernon M. Chinchilli, Chan Shen, Shouhao Zhou","doi":"10.1111/insr.12606","DOIUrl":null,"url":null,"abstract":"<p>Measurement of the interrater agreement (IRA) is critical for assessing the reliability and validity of ratings in various disciplines. While numerous IRA statistics have been developed, there is a lack of guidance on selecting appropriate measures especially when raters' decisions could be correlated. To address this gap, we review a family of chance-corrected IRA statistics for two-rater dichotomous-response cases, a fundamental setting that not only serves as the theoretical foundation for categorical-response or multirater IRA methods but is also practically dominant in most empirical studies, and we propose a novel data-generating framework to simulate correlated decision processes between raters. Subsequently, a new estimand, which calibrates the ‘true’ chance-corrected IRA, is introduced while accounting for the potential ‘probabilistic certainty’. Extensive simulations were conducted to evaluate the performance of the reviewed IRA methods under various practical scenarios and were summarised by an agglomerative hierarchical clustering analysis. Finally, we provide recommendations for selecting appropriate IRA statistics based on outcome prevalence and rater characteristics and highlight the need for further advancements in IRA estimation methodologies.</p>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"93 2","pages":"199-221"},"PeriodicalIF":1.8000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/insr.12606","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Statistical Review","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/insr.12606","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0
Abstract
Measurement of the interrater agreement (IRA) is critical for assessing the reliability and validity of ratings in various disciplines. While numerous IRA statistics have been developed, there is a lack of guidance on selecting appropriate measures especially when raters' decisions could be correlated. To address this gap, we review a family of chance-corrected IRA statistics for two-rater dichotomous-response cases, a fundamental setting that not only serves as the theoretical foundation for categorical-response or multirater IRA methods but is also practically dominant in most empirical studies, and we propose a novel data-generating framework to simulate correlated decision processes between raters. Subsequently, a new estimand, which calibrates the ‘true’ chance-corrected IRA, is introduced while accounting for the potential ‘probabilistic certainty’. Extensive simulations were conducted to evaluate the performance of the reviewed IRA methods under various practical scenarios and were summarised by an agglomerative hierarchical clustering analysis. Finally, we provide recommendations for selecting appropriate IRA statistics based on outcome prevalence and rater characteristics and highlight the need for further advancements in IRA estimation methodologies.
期刊介绍:
International Statistical Review is the flagship journal of the International Statistical Institute (ISI) and of its family of Associations. It publishes papers of broad and general interest in statistics and probability. The term Review is to be interpreted broadly. The types of papers that are suitable for publication include (but are not limited to) the following: reviews/surveys of significant developments in theory, methodology, statistical computing and graphics, statistical education, and application areas; tutorials on important topics; expository papers on emerging areas of research or application; papers describing new developments and/or challenges in relevant areas; papers addressing foundational issues; papers on the history of statistics and probability; white papers on topics of importance to the profession or society; and historical assessment of seminal papers in the field and their impact.