DIF Detection for Multiple Groups: Comparing Three-Level GLMMs and Multiple-Group IRT Models

IF 1.6 4区心理学 Q3 PSYCHOLOGY, APPLIED

Journal of Educational Measurement Pub Date : 2024-02-14 DOI:10.1111/jedm.12384

Carmen Köhler, Lale Khorramdel, Artur Pokropek, Johannes Hartig

{"title":"DIF Detection for Multiple Groups: Comparing Three-Level GLMMs and Multiple-Group IRT Models","authors":"Carmen Köhler, Lale Khorramdel, Artur Pokropek, Johannes Hartig","doi":"10.1111/jedm.12384","DOIUrl":null,"url":null,"abstract":"<p>For assessment scales applied to different groups (e.g., students from different states; patients in different countries), multigroup differential item functioning (MG-DIF) needs to be evaluated in order to ensure that respondents with the same trait level but from different groups have equal response probabilities on a particular item. The current study compares two approaches for DIF detection: a multiple-group item response theory (MG-IRT) model and a generalized linear mixed model (GLMM). In the MG-IRT model approach, item parameters are constrained to be equal across groups and DIF is evaluated for each item in each group. In the GLMM, groups are treated as random, and item difficulties are modeled as correlated random effects with a joint multivariate normal distribution. Its nested structure allows the estimation of item difficulty variances and covariances at the group level. We use an excerpt from the PISA 2015 reading domain as an exemplary empirical investigation, and conduct a simulation study to compare the performance of the two approaches. Results from the empirical investigation show that the detection of countries with DIF is similar in both approaches. Results from the simulation study confirm this finding and indicate slight advantages of the MG-IRT model approach.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 2","pages":"325-344"},"PeriodicalIF":1.6000,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12384","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Educational Measurement","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jedm.12384","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PSYCHOLOGY, APPLIED","Score":null,"Total":0}

引用次数: 0

Abstract

For assessment scales applied to different groups (e.g., students from different states; patients in different countries), multigroup differential item functioning (MG-DIF) needs to be evaluated in order to ensure that respondents with the same trait level but from different groups have equal response probabilities on a particular item. The current study compares two approaches for DIF detection: a multiple-group item response theory (MG-IRT) model and a generalized linear mixed model (GLMM). In the MG-IRT model approach, item parameters are constrained to be equal across groups and DIF is evaluated for each item in each group. In the GLMM, groups are treated as random, and item difficulties are modeled as correlated random effects with a joint multivariate normal distribution. Its nested structure allows the estimation of item difficulty variances and covariances at the group level. We use an excerpt from the PISA 2015 reading domain as an exemplary empirical investigation, and conduct a simulation study to compare the performance of the two approaches. Results from the empirical investigation show that the detection of countries with DIF is similar in both approaches. Results from the simulation study confirm this finding and indicate slight advantages of the MG-IRT model approach.

Abstract Image

查看原文本刊更多论文

多组 DIF 检测：三层 GLMM 与多组 IRT 模型的比较

对于适用于不同群体（如来自不同州的学生；不同国家的病人）的评估量表，需要对多群体差异项目功能（MG-DIF）进行评估，以确保具有相同特质水平但来自不同群体的受访者对特定项目的反应概率相同。本研究比较了两种 DIF 检测方法：多组项目反应理论（MG-IRT）模型和广义线性混合模型（GLMM）。在 MG-IRT 模型方法中，各组的项目参数被限制为相等，DIF 针对每组中的每个项目进行评估。在 GLMM 中，各组被视为随机组，而项目难度则被建模为具有联合多元正态分布的相关随机效应。其嵌套结构允许在组水平上估计项目难度方差和协方差。我们以 2015 年国际学生评估项目（PISA）阅读领域的一个节选作为实证调查的范例，并进行了模拟研究，以比较两种方法的性能。实证调查的结果表明，两种方法对存在 DIF 的国家的检测结果相似。模拟研究的结果证实了这一结论，并表明 MG-IRT 模型方法略胜一筹。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Educational Measurement Multiple-

CiteScore

2.30

自引率

7.70%

发文量

期刊介绍： The Journal of Educational Measurement (JEM) publishes original measurement research, provides reviews of measurement publications, and reports on innovative measurement applications. The topics addressed will interest those concerned with the practice of measurement in field settings, as well as be of interest to measurement theorists. In addition to presenting new contributions to measurement theory and practice, JEM also serves as a vehicle for improving educational measurement applications in a variety of settings.