Assessing quality of selection procedures: Lower bound of false positive rate as a function of inter-rater reliability

IF 1.5 3区 心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS
František Bartoš, Patrícia Martinková
{"title":"Assessing quality of selection procedures: Lower bound of false positive rate as a function of inter-rater reliability","authors":"František Bartoš,&nbsp;Patrícia Martinková","doi":"10.1111/bmsp.12343","DOIUrl":null,"url":null,"abstract":"<p>Inter-rater reliability (IRR) is one of the commonly used tools for assessing the quality of ratings from multiple raters. However, applicant selection procedures based on ratings from multiple raters usually result in a binary outcome; the applicant is either selected or not. This final outcome is not considered in IRR, which instead focuses on the ratings of the individual subjects or objects. We outline the connection between the ratings' measurement model (used for IRR) and a binary classification framework. We develop a simple way of approximating the probability of correctly selecting the best applicants which allows us to compute error probabilities of the selection procedure (i.e., false positive and false negative rate) or their lower bounds. We draw connections between the IRR and the binary classification metrics, showing that binary classification metrics depend solely on the IRR coefficient and proportion of selected applicants. We assess the performance of the approximation in a simulation study and apply it in an example comparing the reliability of multiple grant peer review selection procedures. We also discuss other possible uses of the explored connections in other contexts, such as educational testing, psychological assessment, and health-related measurement, and implement the computations in the R package <span>IRR2FPR</span>.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"77 3","pages":"651-671"},"PeriodicalIF":1.5000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12343","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"British Journal of Mathematical & Statistical Psychology","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/bmsp.12343","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Inter-rater reliability (IRR) is one of the commonly used tools for assessing the quality of ratings from multiple raters. However, applicant selection procedures based on ratings from multiple raters usually result in a binary outcome; the applicant is either selected or not. This final outcome is not considered in IRR, which instead focuses on the ratings of the individual subjects or objects. We outline the connection between the ratings' measurement model (used for IRR) and a binary classification framework. We develop a simple way of approximating the probability of correctly selecting the best applicants which allows us to compute error probabilities of the selection procedure (i.e., false positive and false negative rate) or their lower bounds. We draw connections between the IRR and the binary classification metrics, showing that binary classification metrics depend solely on the IRR coefficient and proportion of selected applicants. We assess the performance of the approximation in a simulation study and apply it in an example comparing the reliability of multiple grant peer review selection procedures. We also discuss other possible uses of the explored connections in other contexts, such as educational testing, psychological assessment, and health-related measurement, and implement the computations in the R package IRR2FPR.

Abstract Image

评估筛选程序的质量:假阳性率下限与评分者间可靠性的关系
评分者之间的可靠性(IRR)是评估多个评分者评分质量的常用工具之一。然而,基于多个评分者评分的申请人甄选程序通常会产生二元结果:申请人要么被选中,要么不被选中。IRR 并不考虑这一最终结果,而是将重点放在对单个主体或对象的评分上。我们概述了评级测量模型(用于 IRR)与二元分类框架之间的联系。我们开发了一种近似正确选择最佳申请人概率的简单方法,通过这种方法,我们可以计算选择程序的错误概率(即假阳性率和假阴性率)或其下限。我们得出了 IRR 和二元分类指标之间的联系,表明二元分类指标完全取决于 IRR 系数和入选申请人的比例。我们在模拟研究中评估了近似值的性能,并将其应用于一个比较多个基金同行评审选择程序可靠性的例子中。我们还讨论了在教育测试、心理评估和健康相关测量等其他情况下探索出的联系的其他可能用途,并在 R 软件包 IRR2FPR 中实现了计算。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.00
自引率
3.80%
发文量
34
审稿时长
>12 weeks
期刊介绍: The British Journal of Mathematical and Statistical Psychology publishes articles relating to areas of psychology which have a greater mathematical or statistical aspect of their argument than is usually acceptable to other journals including: • mathematical psychology • statistics • psychometrics • decision making • psychophysics • classification • relevant areas of mathematics, computing and computer software These include articles that address substantitive psychological issues or that develop and extend techniques useful to psychologists. New models for psychological processes, new approaches to existing data, critiques of existing models and improved algorithms for estimating the parameters of a model are examples of articles which may be favoured.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信