Bayesian Nonparametric Models for Multiple Raters: A General Statistical Framework.

IF 3.1 2区心理学 Q1 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Psychometrika Pub Date : 2025-08-11 DOI:10.1017/psy.2025.10035

Giuseppe Mignemi, Ioanna Manolopoulou

{"title":"Bayesian Nonparametric Models for Multiple Raters: A General Statistical Framework.","authors":"Giuseppe Mignemi, Ioanna Manolopoulou","doi":"10.1017/psy.2025.10035","DOIUrl":null,"url":null,"abstract":"Rating procedure is crucial in many applied fields (e.g., educational, clinical, emergency). In these contexts, a rater (e.g., teacher, doctor) scores a subject (e.g., student, doctor) on a rating scale. Given raters' variability, several statistical methods have been proposed for assessing and improving the quality of ratings. The analysis and the estimate of the Intraclass Correlation Coefficient (ICC) are major concerns in such cases. As evidenced by the literature, ICC might differ across different subgroups of raters and might be affected by contextual factors and subject heterogeneity. Model estimation in the presence of heterogeneity has been one of the recent challenges in this research line. Consequently, several methods have been proposed to address this issue under a parametric multilevel modelling framework, in which strong distributional assumptions are made. We propose a more flexible model under the Bayesian nonparametric (BNP) framework, in which most of those assumptions are relaxed. By eliciting hierarchical discrete nonparametric priors, the model accommodates clusters among raters and subjects, naturally accounts for heterogeneity, and improves estimates' accuracy. We propose a general BNP heteroscedastic framework to analyze continuous and coarse rating data and possible latent differences among subjects and raters. The estimated densities are used to make inferences about the rating process and the quality of the ratings. By exploiting a stick-breaking representation of the discrete nonparametric priors, a general class of ICC indices might be derived for these models. Our method allows us to independently identify latent similarities between subjects and raters and can be applied in precise education to improve personalized teaching programs or interventions. Theoretical results about the ICC are provided together with computational strategies. Simulations and a real-world application are presented, and possible future directions are discussed.","PeriodicalId":54534,"journal":{"name":"Psychometrika","volume":" ","pages":"1-36"},"PeriodicalIF":3.1000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Psychometrika","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1017/psy.2025.10035","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Rating procedure is crucial in many applied fields (e.g., educational, clinical, emergency). In these contexts, a rater (e.g., teacher, doctor) scores a subject (e.g., student, doctor) on a rating scale. Given raters' variability, several statistical methods have been proposed for assessing and improving the quality of ratings. The analysis and the estimate of the Intraclass Correlation Coefficient (ICC) are major concerns in such cases. As evidenced by the literature, ICC might differ across different subgroups of raters and might be affected by contextual factors and subject heterogeneity. Model estimation in the presence of heterogeneity has been one of the recent challenges in this research line. Consequently, several methods have been proposed to address this issue under a parametric multilevel modelling framework, in which strong distributional assumptions are made. We propose a more flexible model under the Bayesian nonparametric (BNP) framework, in which most of those assumptions are relaxed. By eliciting hierarchical discrete nonparametric priors, the model accommodates clusters among raters and subjects, naturally accounts for heterogeneity, and improves estimates' accuracy. We propose a general BNP heteroscedastic framework to analyze continuous and coarse rating data and possible latent differences among subjects and raters. The estimated densities are used to make inferences about the rating process and the quality of the ratings. By exploiting a stick-breaking representation of the discrete nonparametric priors, a general class of ICC indices might be derived for these models. Our method allows us to independently identify latent similarities between subjects and raters and can be applied in precise education to improve personalized teaching programs or interventions. Theoretical results about the ICC are provided together with computational strategies. Simulations and a real-world application are presented, and possible future directions are discussed.

查看原文本刊更多论文

多评分者的贝叶斯非参数模型：一个通用的统计框架。

评级程序在许多应用领域（如教育、临床、急救）至关重要。在这些情况下，评价者（例如，老师，医生）在评价表上给一个主体（例如，学生，医生）打分。鉴于评级者的可变性，已经提出了几种统计方法来评估和提高评级的质量。在这种情况下，类内相关系数（ICC）的分析和估计是主要关注的问题。正如文献所证明的那样，ICC可能在不同的评分者亚组中有所不同，并可能受到背景因素和受试者异质性的影响。存在异质性的模型估计是这一研究领域最近面临的挑战之一。因此，提出了几种方法在参数化多层建模框架下解决这一问题，其中做出了强分布假设。我们在贝叶斯非参数框架下提出了一个更灵活的模型，其中大多数假设都是宽松的。通过引出分层离散非参数先验，该模型适应了评分者和受试者之间的聚类，自然地解释了异质性，提高了估计的准确性。我们提出了一个通用的BNP异方差框架来分析连续和粗糙的评分数据以及受试者和评分者之间可能存在的潜在差异。估计的密度用于对评级过程和评级质量进行推断。通过利用离散非参数先验的断裂表示，可以为这些模型导出一般类型的ICC指标。我们的方法使我们能够独立地识别受试者和评分者之间潜在的相似性，并可应用于精确教育，以改进个性化的教学计划或干预措施。给出了有关ICC的理论结果和计算策略。给出了仿真和实际应用，并讨论了可能的未来发展方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Psychometrika 数学-数学跨学科应用

CiteScore

4.40

自引率

10.00%

发文量

审稿时长

>12 weeks

期刊介绍： The journal Psychometrika is devoted to the advancement of theory and methodology for behavioral data in psychology, education and the social and behavioral sciences generally. Its coverage is offered in two sections: Theory and Methods (T& M), and Application Reviews and Case Studies (ARCS). T&M articles present original research and reviews on the development of quantitative models, statistical methods, and mathematical techniques for evaluating data from psychology, the social and behavioral sciences and related fields. Application Reviews can be integrative, drawing together disparate methodologies for applications, or comparative and evaluative, discussing advantages and disadvantages of one or more methodologies in applications. Case Studies highlight methodology that deepens understanding of substantive phenomena through more informative data analysis, or more elegant data description.