{"title":"A Comparison of Anchor Selection Strategies for DIF Analysis","authors":"Haeju Lee, Kyung Yong Kim","doi":"10.1111/jedm.12429","DOIUrl":null,"url":null,"abstract":"<p>When no prior information of differential item functioning (DIF) exists for items in a test, either the rank-based or iterative purification procedure might be preferred. The rank-based purification selects anchor items based on a preliminary DIF test. For a preliminary DIF test, likelihood ratio test (LRT) based approaches (e.g., all-others-as-anchors: AOAA and one-item-anchor: OIA) and an improved version of Lord's Wald test (i.e., anchor-all-test-all: AATA) have been used in research studies. However, both LRT- and Wald-based procedures often select DIF items as anchor items and as a result, inflate Type <span></span><math>\n <semantics>\n <mi>I</mi>\n <annotation>${\\mathrm{{\\mathrm I}}}$</annotation>\n </semantics></math> error rates. To overcome this issue, minimum test statistics (<span></span><math>\n <semantics>\n <mrow>\n <mi>Min</mi>\n <mspace></mspace>\n <msup>\n <mi>G</mi>\n <mn>2</mn>\n </msup>\n </mrow>\n <annotation>${\\mathrm{Min}}\\;{G^2}$</annotation>\n </semantics></math>/<span></span><math>\n <semantics>\n <msup>\n <mi>χ</mi>\n <mn>2</mn>\n </msup>\n <annotation>${\\chi ^2}$</annotation>\n </semantics></math>) or items with nonsignificant test statistics and large discrimination parameter estimates (<span></span><math>\n <semantics>\n <mi>NonsigMax</mi>\n <annotation>${\\mathrm{NonsigMax}}$</annotation>\n </semantics></math><i>A</i>) have been suggested in the literature to select anchor items. Nevertheless, little research has been done comparing combinations of the three anchor selection procedures paired with the two anchor selection criteria. Thus, the performance of the six rank-based strategies was compared in this study in terms of accuracy, power, and Type <span></span><math>\n <semantics>\n <mi>I</mi>\n <annotation>${\\mathrm{{\\mathrm I}}}$</annotation>\n </semantics></math> error rates. Among the rank-based strategies, the AOAA-based strategies demonstrated greater robustness across various conditions compared to the AATA- and OIA-based strategies. Additionally, the <span></span><math>\n <semantics>\n <mrow>\n <mrow>\n <mi>Min</mi>\n <mspace></mspace>\n </mrow>\n <msup>\n <mi>G</mi>\n <mn>2</mn>\n </msup>\n </mrow>\n <annotation>${\\mathrm{Min\\;}}{G^2}$</annotation>\n </semantics></math>/<span></span><math>\n <semantics>\n <msup>\n <mi>χ</mi>\n <mn>2</mn>\n </msup>\n <annotation>${\\chi ^2}$</annotation>\n </semantics></math> criterion exhibited better performance under various conditions compared to <span></span><math>\n <semantics>\n <mi>NonsigMax</mi>\n <annotation>${\\mathrm{NonsigMax}}$</annotation>\n </semantics></math><i>A</i> criterion.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"62 2","pages":"311-344"},"PeriodicalIF":1.6000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12429","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Educational Measurement","FirstCategoryId":"102","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/jedm.12429","RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PSYCHOLOGY, APPLIED","Score":null,"Total":0}
引用次数: 0
Abstract
When no prior information of differential item functioning (DIF) exists for items in a test, either the rank-based or iterative purification procedure might be preferred. The rank-based purification selects anchor items based on a preliminary DIF test. For a preliminary DIF test, likelihood ratio test (LRT) based approaches (e.g., all-others-as-anchors: AOAA and one-item-anchor: OIA) and an improved version of Lord's Wald test (i.e., anchor-all-test-all: AATA) have been used in research studies. However, both LRT- and Wald-based procedures often select DIF items as anchor items and as a result, inflate Type error rates. To overcome this issue, minimum test statistics (/) or items with nonsignificant test statistics and large discrimination parameter estimates (A) have been suggested in the literature to select anchor items. Nevertheless, little research has been done comparing combinations of the three anchor selection procedures paired with the two anchor selection criteria. Thus, the performance of the six rank-based strategies was compared in this study in terms of accuracy, power, and Type error rates. Among the rank-based strategies, the AOAA-based strategies demonstrated greater robustness across various conditions compared to the AATA- and OIA-based strategies. Additionally, the / criterion exhibited better performance under various conditions compared to A criterion.
期刊介绍:
The Journal of Educational Measurement (JEM) publishes original measurement research, provides reviews of measurement publications, and reports on innovative measurement applications. The topics addressed will interest those concerned with the practice of measurement in field settings, as well as be of interest to measurement theorists. In addition to presenting new contributions to measurement theory and practice, JEM also serves as a vehicle for improving educational measurement applications in a variety of settings.