Assessing quality of selection procedures: Lower bound of false positive rate as a function of inter-rater reliability

IF 1.5 3区心理学 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

British Journal of Mathematical & Statistical Psychology Pub Date : 2024-04-15 DOI:10.1111/bmsp.12343

František Bartoš, Patrícia Martinková

{"title":"Assessing quality of selection procedures: Lower bound of false positive rate as a function of inter-rater reliability","authors":"František Bartoš, Patrícia Martinková","doi":"10.1111/bmsp.12343","DOIUrl":null,"url":null,"abstract":"<p>Inter-rater reliability (IRR) is one of the commonly used tools for assessing the quality of ratings from multiple raters. However, applicant selection procedures based on ratings from multiple raters usually result in a binary outcome; the applicant is either selected or not. This final outcome is not considered in IRR, which instead focuses on the ratings of the individual subjects or objects. We outline the connection between the ratings' measurement model (used for IRR) and a binary classification framework. We develop a simple way of approximating the probability of correctly selecting the best applicants which allows us to compute error probabilities of the selection procedure (i.e., false positive and false negative rate) or their lower bounds. We draw connections between the IRR and the binary classification metrics, showing that binary classification metrics depend solely on the IRR coefficient and proportion of selected applicants. We assess the performance of the approximation in a simulation study and apply it in an example comparing the reliability of multiple grant peer review selection procedures. We also discuss other possible uses of the explored connections in other contexts, such as educational testing, psychological assessment, and health-related measurement, and implement the computations in the R package <span>IRR2FPR</span>.</p>","PeriodicalId":55322,"journal":{"name":"British Journal of Mathematical & Statistical Psychology","volume":"77 3","pages":"651-671"},"PeriodicalIF":1.5000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12343","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"British Journal of Mathematical & Statistical Psychology","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/bmsp.12343","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Inter-rater reliability (IRR) is one of the commonly used tools for assessing the quality of ratings from multiple raters. However, applicant selection procedures based on ratings from multiple raters usually result in a binary outcome; the applicant is either selected or not. This final outcome is not considered in IRR, which instead focuses on the ratings of the individual subjects or objects. We outline the connection between the ratings' measurement model (used for IRR) and a binary classification framework. We develop a simple way of approximating the probability of correctly selecting the best applicants which allows us to compute error probabilities of the selection procedure (i.e., false positive and false negative rate) or their lower bounds. We draw connections between the IRR and the binary classification metrics, showing that binary classification metrics depend solely on the IRR coefficient and proportion of selected applicants. We assess the performance of the approximation in a simulation study and apply it in an example comparing the reliability of multiple grant peer review selection procedures. We also discuss other possible uses of the explored connections in other contexts, such as educational testing, psychological assessment, and health-related measurement, and implement the computations in the R package IRR2FPR.

Abstract Image

查看原文本刊更多论文

评估筛选程序的质量：假阳性率下限与评分者间可靠性的关系

评分者之间的可靠性（IRR）是评估多个评分者评分质量的常用工具之一。然而，基于多个评分者评分的申请人甄选程序通常会产生二元结果：申请人要么被选中，要么不被选中。IRR 并不考虑这一最终结果，而是将重点放在对单个主体或对象的评分上。我们概述了评级测量模型（用于 IRR）与二元分类框架之间的联系。我们开发了一种近似正确选择最佳申请人概率的简单方法，通过这种方法，我们可以计算选择程序的错误概率（即假阳性率和假阴性率）或其下限。我们得出了 IRR 和二元分类指标之间的联系，表明二元分类指标完全取决于 IRR 系数和入选申请人的比例。我们在模拟研究中评估了近似值的性能，并将其应用于一个比较多个基金同行评审选择程序可靠性的例子中。我们还讨论了在教育测试、心理评估和健康相关测量等其他情况下探索出的联系的其他可能用途，并在 R 软件包 IRR2FPR 中实现了计算。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

British Journal of Mathematical & Statistical Psychology 医学-数学跨学科应用

CiteScore

5.00

自引率

3.80%

发文量

审稿时长

>12 weeks

期刊介绍： The British Journal of Mathematical and Statistical Psychology publishes articles relating to areas of psychology which have a greater mathematical or statistical aspect of their argument than is usually acceptable to other journals including: • mathematical psychology • statistics • psychometrics • decision making • psychophysics • classification • relevant areas of mathematics, computing and computer software These include articles that address substantitive psychological issues or that develop and extend techniques useful to psychologists. New models for psychological processes, new approaches to existing data, critiques of existing models and improved algorithms for estimating the parameters of a model are examples of articles which may be favoured.