Chance-Corrected Interrater Agreement Statistics for Two-Rater Dichotomous Responses: A Method Review With Comparative Assessment Under Possibly Correlated Decisions

IF 1.8 3区数学 Q1 STATISTICS & PROBABILITY

International Statistical Review Pub Date : 2025-01-06 DOI:10.1111/insr.12606

Zizhong Tian, Vernon M. Chinchilli, Chan Shen, Shouhao Zhou

{"title":"Chance-Corrected Interrater Agreement Statistics for Two-Rater Dichotomous Responses: A Method Review With Comparative Assessment Under Possibly Correlated Decisions","authors":"Zizhong Tian, Vernon M. Chinchilli, Chan Shen, Shouhao Zhou","doi":"10.1111/insr.12606","DOIUrl":null,"url":null,"abstract":"<p>Measurement of the interrater agreement (IRA) is critical for assessing the reliability and validity of ratings in various disciplines. While numerous IRA statistics have been developed, there is a lack of guidance on selecting appropriate measures especially when raters' decisions could be correlated. To address this gap, we review a family of chance-corrected IRA statistics for two-rater dichotomous-response cases, a fundamental setting that not only serves as the theoretical foundation for categorical-response or multirater IRA methods but is also practically dominant in most empirical studies, and we propose a novel data-generating framework to simulate correlated decision processes between raters. Subsequently, a new estimand, which calibrates the ‘true’ chance-corrected IRA, is introduced while accounting for the potential ‘probabilistic certainty’. Extensive simulations were conducted to evaluate the performance of the reviewed IRA methods under various practical scenarios and were summarised by an agglomerative hierarchical clustering analysis. Finally, we provide recommendations for selecting appropriate IRA statistics based on outcome prevalence and rater characteristics and highlight the need for further advancements in IRA estimation methodologies.</p>","PeriodicalId":14479,"journal":{"name":"International Statistical Review","volume":"93 2","pages":"199-221"},"PeriodicalIF":1.8000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/insr.12606","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Statistical Review","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/insr.12606","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

Measurement of the interrater agreement (IRA) is critical for assessing the reliability and validity of ratings in various disciplines. While numerous IRA statistics have been developed, there is a lack of guidance on selecting appropriate measures especially when raters' decisions could be correlated. To address this gap, we review a family of chance-corrected IRA statistics for two-rater dichotomous-response cases, a fundamental setting that not only serves as the theoretical foundation for categorical-response or multirater IRA methods but is also practically dominant in most empirical studies, and we propose a novel data-generating framework to simulate correlated decision processes between raters. Subsequently, a new estimand, which calibrates the ‘true’ chance-corrected IRA, is introduced while accounting for the potential ‘probabilistic certainty’. Extensive simulations were conducted to evaluate the performance of the reviewed IRA methods under various practical scenarios and were summarised by an agglomerative hierarchical clustering analysis. Finally, we provide recommendations for selecting appropriate IRA statistics based on outcome prevalence and rater characteristics and highlight the need for further advancements in IRA estimation methodologies.

Abstract Image

查看原文本刊更多论文

二评者二分反应的机会校正间评者一致性统计：在可能相关决策下的比较评估方法回顾

评价者一致性（IRA）的测量对于评估不同学科评价者的可靠性和有效性至关重要。虽然已经开发了许多IRA统计数据，但缺乏关于选择适当措施的指导，特别是当评级者的决定可能相关时。为了解决这一差距，我们回顾了一组针对两等级二分类反应案例的机会校正IRA统计数据，这一基本设置不仅是分类反应或多等级IRA方法的理论基础，而且在大多数实证研究中也占主导地位，我们提出了一个新的数据生成框架来模拟评级者之间的相关决策过程。随后，在考虑潜在的“概率确定性”的同时，引入了一个新的估计，该估计校准了“真实的”机会校正IRA。进行了大量的模拟来评估所述IRA方法在各种实际场景下的性能，并通过聚集分层聚类分析进行总结。最后，我们提供了基于结果流行率和比率特征选择适当的IRA统计数据的建议，并强调了IRA估计方法进一步发展的必要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Statistical Review 数学-统计学与概率论

CiteScore

4.30

自引率

5.00%

发文量

审稿时长

>12 weeks

期刊介绍： International Statistical Review is the flagship journal of the International Statistical Institute (ISI) and of its family of Associations. It publishes papers of broad and general interest in statistics and probability. The term Review is to be interpreted broadly. The types of papers that are suitable for publication include (but are not limited to) the following: reviews/surveys of significant developments in theory, methodology, statistical computing and graphics, statistical education, and application areas; tutorials on important topics; expository papers on emerging areas of research or application; papers describing new developments and/or challenges in relevant areas; papers addressing foundational issues; papers on the history of statistics and probability; white papers on topics of importance to the profession or society; and historical assessment of seminal papers in the field and their impact.