A Comparison of Anchor Selection Strategies for DIF Analysis

IF 1.6 4区心理学 Q3 PSYCHOLOGY, APPLIED

Journal of Educational Measurement Pub Date : 2025-03-20 DOI:10.1111/jedm.12429

Haeju Lee, Kyung Yong Kim

{"title":"A Comparison of Anchor Selection Strategies for DIF Analysis","authors":"Haeju Lee, Kyung Yong Kim","doi":"10.1111/jedm.12429","DOIUrl":null,"url":null,"abstract":"When no prior information of differential item functioning (DIF) exists for items in a test, either the rank-based or iterative purification procedure might be preferred. The rank-based purification selects anchor items based on a preliminary DIF test. For a preliminary DIF test, likelihood ratio test (LRT) based approaches (e.g., all-others-as-anchors: AOAA and one-item-anchor: OIA) and an improved version of Lord's Wald test (i.e., anchor-all-test-all: AATA) have been used in research studies. However, both LRT- and Wald-based procedures often select DIF items as anchor items and as a result, inflate Type <math>\n <semantics>\n <mi>I</mi>\n <annotation>${\\mathrm{{\\mathrm I}}}$</annotation>\n </semantics></math> error rates. To overcome this issue, minimum test statistics (<math>\n <semantics>\n <mrow>\n <mi>Min</mi>\n <mspace></mspace>\n <msup>\n <mi>G</mi>\n <mn>2</mn>\n </msup>\n </mrow>\n <annotation>${\\mathrm{Min}}\\;{G^2}$</annotation>\n </semantics></math>/<math>\n <semantics>\n <msup>\n <mi>χ</mi>\n <mn>2</mn>\n </msup>\n <annotation>${\\chi ^2}$</annotation>\n </semantics></math>) or items with nonsignificant test statistics and large discrimination parameter estimates (<math>\n <semantics>\n <mi>NonsigMax</mi>\n <annotation>${\\mathrm{NonsigMax}}$</annotation>\n </semantics></math>A) have been suggested in the literature to select anchor items. Nevertheless, little research has been done comparing combinations of the three anchor selection procedures paired with the two anchor selection criteria. Thus, the performance of the six rank-based strategies was compared in this study in terms of accuracy, power, and Type <math>\n <semantics>\n <mi>I</mi>\n <annotation>${\\mathrm{{\\mathrm I}}}$</annotation>\n </semantics></math> error rates. Among the rank-based strategies, the AOAA-based strategies demonstrated greater robustness across various conditions compared to the AATA- and OIA-based strategies. Additionally, the <math>\n <semantics>\n <mrow>\n <mrow>\n <mi>Min</mi>\n <mspace></mspace>\n </mrow>\n <msup>\n <mi>G</mi>\n <mn>2</mn>\n </msup>\n </mrow>\n <annotation>${\\mathrm{Min\\;}}{G^2}$</annotation>\n </semantics></math>/<math>\n <semantics>\n <msup>\n <mi>χ</mi>\n <mn>2</mn>\n </msup>\n <annotation>${\\chi ^2}$</annotation>\n </semantics></math> criterion exhibited better performance under various conditions compared to <math>\n <semantics>\n <mi>NonsigMax</mi>\n <annotation>${\\mathrm{NonsigMax}}$</annotation>\n </semantics></math>A criterion.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"62 2","pages":"311-344"},"PeriodicalIF":1.6000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12429","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Educational Measurement","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jedm.12429","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PSYCHOLOGY, APPLIED","Score":null,"Total":0}

引用次数: 0

Abstract

When no prior information of differential item functioning (DIF) exists for items in a test, either the rank-based or iterative purification procedure might be preferred. The rank-based purification selects anchor items based on a preliminary DIF test. For a preliminary DIF test, likelihood ratio test (LRT) based approaches (e.g., all-others-as-anchors: AOAA and one-item-anchor: OIA) and an improved version of Lord's Wald test (i.e., anchor-all-test-all: AATA) have been used in research studies. However, both LRT- and Wald-based procedures often select DIF items as anchor items and as a result, inflate Type $I$ error rates. To overcome this issue, minimum test statistics ( $Min G^{2}$ / $χ^{2}$ ) or items with nonsignificant test statistics and large discrimination parameter estimates ( $NonsigMax$ A) have been suggested in the literature to select anchor items. Nevertheless, little research has been done comparing combinations of the three anchor selection procedures paired with the two anchor selection criteria. Thus, the performance of the six rank-based strategies was compared in this study in terms of accuracy, power, and Type $I$ error rates. Among the rank-based strategies, the AOAA-based strategies demonstrated greater robustness across various conditions compared to the AATA- and OIA-based strategies. Additionally, the $Min G^{2}$ / $χ^{2}$ criterion exhibited better performance under various conditions compared to $NonsigMax$ A criterion.

Abstract Image

查看原文本刊更多论文

DIF分析锚点选择策略的比较

当测试中的项目不存在差异项目功能（DIF）的先验信息时，基于等级的或迭代的纯化过程可能是首选的。基于等级的纯化根据初步的DIF测试选择锚点项目。对于初步的DIF检验，研究中使用了基于似然比检验（LRT）的方法（例如，所有其他人作为锚：AOAA和一项锚：OIA）和改进版本的Lord's Wald检验（即锚点全部测试：AATA）。然而，基于LRT和基于wald的过程通常都选择DIF项作为锚定项，因此，膨胀Type I ${\ mathm {{\ mathm I}}}$错误率。为了克服这个问题，最小检验统计量（Min g2 ${\ mathm {Min}};{G^2}$ / χ 2 ${\chi ^2}$）或具有不显著检验统计量和大判别参数估计的项目（NonsigMax ${\mathrm{NonsigMax}}$ A）已经在文献中被建议选择锚项。然而，很少有研究对三种锚点选择程序与两种锚点选择标准的组合进行比较。因此，在本研究中，比较了六种基于排名的策略在准确率、功率和类型I ${\ mathm {{\ mathm I}}}$错误率方面的表现。在基于排名的策略中，与基于AATA和基于oia的策略相比，基于aoaa的策略在各种条件下表现出更强的鲁棒性。此外,Min g2 ${\ mathm {Min\;}}{G^2}$ / χ 2 ${\chi ^2}$准则在各种条件下均表现出较好的性能与NonsigMax ${\mathrm{NonsigMax}}$ A标准比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Educational Measurement Multiple-

CiteScore

2.30

自引率

7.70%

发文量

期刊介绍： The Journal of Educational Measurement (JEM) publishes original measurement research, provides reviews of measurement publications, and reports on innovative measurement applications. The topics addressed will interest those concerned with the practice of measurement in field settings, as well as be of interest to measurement theorists. In addition to presenting new contributions to measurement theory and practice, JEM also serves as a vehicle for improving educational measurement applications in a variety of settings.