{"title":"Evaluating Four Methods for Detecting Differential Item Functioning in Large-Scale Assessments with More Than Two Groups","authors":"Dandan Chen Kaptur, Jinming Zhang","doi":"arxiv-2408.11922","DOIUrl":null,"url":null,"abstract":"This study evaluated four multi-group differential item functioning (DIF)\nmethods (the root mean square deviation approach, Wald-1, generalized logistic\nregression procedure, and generalized Mantel-Haenszel method) via Monte Carlo\nsimulation of controlled testing conditions. These conditions varied in the\nnumber of groups, the ability and sample size of the DIF-contaminated group,\nthe parameter associated with DIF, and the proportion of DIF items. When\ncomparing Type-I error rates and powers of the methods, we showed that the RMSD\napproach yielded the best Type-I error rates when it was used with\nmodel-predicted cutoff values. Also, this approach was found to be overly\nconservative when used with the commonly used cutoff value of 0.1. Implications\nfor future research for educational researchers and practitioners were\ndiscussed.","PeriodicalId":501172,"journal":{"name":"arXiv - STAT - Applications","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.11922","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This study evaluated four multi-group differential item functioning (DIF)
methods (the root mean square deviation approach, Wald-1, generalized logistic
regression procedure, and generalized Mantel-Haenszel method) via Monte Carlo
simulation of controlled testing conditions. These conditions varied in the
number of groups, the ability and sample size of the DIF-contaminated group,
the parameter associated with DIF, and the proportion of DIF items. When
comparing Type-I error rates and powers of the methods, we showed that the RMSD
approach yielded the best Type-I error rates when it was used with
model-predicted cutoff values. Also, this approach was found to be overly
conservative when used with the commonly used cutoff value of 0.1. Implications
for future research for educational researchers and practitioners were
discussed.