DIF Detection for Multiple Groups: Comparing Three-Level GLMMs and Multiple-Group IRT Models

IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED
Carmen Köhler, Lale Khorramdel, Artur Pokropek, Johannes Hartig
{"title":"DIF Detection for Multiple Groups: Comparing Three-Level GLMMs and Multiple-Group IRT Models","authors":"Carmen Köhler,&nbsp;Lale Khorramdel,&nbsp;Artur Pokropek,&nbsp;Johannes Hartig","doi":"10.1111/jedm.12384","DOIUrl":null,"url":null,"abstract":"<p>For assessment scales applied to different groups (e.g., students from different states; patients in different countries), multigroup differential item functioning (MG-DIF) needs to be evaluated in order to ensure that respondents with the same trait level but from different groups have equal response probabilities on a particular item. The current study compares two approaches for DIF detection: a multiple-group item response theory (MG-IRT) model and a generalized linear mixed model (GLMM). In the MG-IRT model approach, item parameters are constrained to be equal across groups and DIF is evaluated for each item in each group. In the GLMM, groups are treated as random, and item difficulties are modeled as correlated random effects with a joint multivariate normal distribution. Its nested structure allows the estimation of item difficulty variances and covariances at the group level. We use an excerpt from the PISA 2015 reading domain as an exemplary empirical investigation, and conduct a simulation study to compare the performance of the two approaches. Results from the empirical investigation show that the detection of countries with DIF is similar in both approaches. Results from the simulation study confirm this finding and indicate slight advantages of the MG-IRT model approach.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 2","pages":"325-344"},"PeriodicalIF":1.4000,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12384","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Educational Measurement","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jedm.12384","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PSYCHOLOGY, APPLIED","Score":null,"Total":0}
引用次数: 0

Abstract

For assessment scales applied to different groups (e.g., students from different states; patients in different countries), multigroup differential item functioning (MG-DIF) needs to be evaluated in order to ensure that respondents with the same trait level but from different groups have equal response probabilities on a particular item. The current study compares two approaches for DIF detection: a multiple-group item response theory (MG-IRT) model and a generalized linear mixed model (GLMM). In the MG-IRT model approach, item parameters are constrained to be equal across groups and DIF is evaluated for each item in each group. In the GLMM, groups are treated as random, and item difficulties are modeled as correlated random effects with a joint multivariate normal distribution. Its nested structure allows the estimation of item difficulty variances and covariances at the group level. We use an excerpt from the PISA 2015 reading domain as an exemplary empirical investigation, and conduct a simulation study to compare the performance of the two approaches. Results from the empirical investigation show that the detection of countries with DIF is similar in both approaches. Results from the simulation study confirm this finding and indicate slight advantages of the MG-IRT model approach.

Abstract Image

多组 DIF 检测:三层 GLMM 与多组 IRT 模型的比较
对于适用于不同群体(如来自不同州的学生;不同国家的病人)的评估量表,需要对多群体差异项目功能(MG-DIF)进行评估,以确保具有相同特质水平但来自不同群体的受访者对特定项目的反应概率相同。本研究比较了两种 DIF 检测方法:多组项目反应理论(MG-IRT)模型和广义线性混合模型(GLMM)。在 MG-IRT 模型方法中,各组的项目参数被限制为相等,DIF 针对每组中的每个项目进行评估。在 GLMM 中,各组被视为随机组,而项目难度则被建模为具有联合多元正态分布的相关随机效应。其嵌套结构允许在组水平上估计项目难度方差和协方差。我们以 2015 年国际学生评估项目(PISA)阅读领域的一个节选作为实证调查的范例,并进行了模拟研究,以比较两种方法的性能。实证调查的结果表明,两种方法对存在 DIF 的国家的检测结果相似。模拟研究的结果证实了这一结论,并表明 MG-IRT 模型方法略胜一筹。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.30
自引率
7.70%
发文量
46
期刊介绍: The Journal of Educational Measurement (JEM) publishes original measurement research, provides reviews of measurement publications, and reports on innovative measurement applications. The topics addressed will interest those concerned with the practice of measurement in field settings, as well as be of interest to measurement theorists. In addition to presenting new contributions to measurement theory and practice, JEM also serves as a vehicle for improving educational measurement applications in a variety of settings.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信