Practical Considerations in Item Calibration With Small Samples Under Multistage Test Design: A Case Study

Q3 Social Sciences

ETS Research Report Series Pub Date : 2024-02-04 DOI:10.1002/ets2.12376

Hongwen Guo, Matthew S. Johnson, Daniel F. McCaffrey, Lixong Gu

{"title":"Practical Considerations in Item Calibration With Small Samples Under Multistage Test Design: A Case Study","authors":"Hongwen Guo, Matthew S. Johnson, Daniel F. McCaffrey, Lixong Gu","doi":"10.1002/ets2.12376","DOIUrl":null,"url":null,"abstract":"<p>The multistage testing (MST) design has been gaining attention and popularity in educational assessments. For testing programs that have small test-taker samples, it is challenging to calibrate new items to replenish the item pool. In the current research, we used the item pools from an operational MST program to illustrate how research studies can be built upon literature and program-specific data to help to fill the gaps between research and practice and to make sound psychometric decisions to address the small-sample issues. The studies included choice of item calibration methods, data collection designs to increase sample sizes, and item response theory models in producing the score conversion tables. Our results showed that, with small samples, the fixed parameter calibration (FIPC) method performed consistently the best for calibrating new items, compared to the traditional separate-calibration with scaling method and a new approach of a calibration method based on the minimum discriminant information adjustment. In addition, the concurrent FIPC calibration with data from multiple administrations also improved parameter estimation of new items. However, because of the program-specific settings, a simpler model may not improve current practice when the sample size was small and when the initial item pools were well-calibrated using a two-parameter logistic model with a large field trial data.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2024 1","pages":"1-21"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12376","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ETS Research Report Series","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ets2.12376","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 0

Abstract

The multistage testing (MST) design has been gaining attention and popularity in educational assessments. For testing programs that have small test-taker samples, it is challenging to calibrate new items to replenish the item pool. In the current research, we used the item pools from an operational MST program to illustrate how research studies can be built upon literature and program-specific data to help to fill the gaps between research and practice and to make sound psychometric decisions to address the small-sample issues. The studies included choice of item calibration methods, data collection designs to increase sample sizes, and item response theory models in producing the score conversion tables. Our results showed that, with small samples, the fixed parameter calibration (FIPC) method performed consistently the best for calibrating new items, compared to the traditional separate-calibration with scaling method and a new approach of a calibration method based on the minimum discriminant information adjustment. In addition, the concurrent FIPC calibration with data from multiple administrations also improved parameter estimation of new items. However, because of the program-specific settings, a simpler model may not improve current practice when the sample size was small and when the initial item pools were well-calibrated using a two-parameter logistic model with a large field trial data.

Abstract Image

查看原文本刊更多论文

多阶段测试设计下小样本项目校准的实际考虑因素：案例研究

多阶段测试（MST）设计在教育评估中越来越受到关注和欢迎。对于考生样本较少的测试项目来说，校准新项目以补充项目库是一项挑战。在目前的研究中，我们使用了一个运行中的 MST 项目的项目库，以说明如何在文献和特定项目数据的基础上开展研究，帮助填补研究与实践之间的空白，并做出合理的心理测量决策，以解决小样本问题。这些研究包括项目校准方法的选择、增加样本量的数据收集设计以及制作分数转换表的项目反应理论模型。研究结果表明，在小样本情况下，固定参数校准法（FIPC）在校准新项目方面一直表现最佳，与之相比，传统的分别校准加比例法和基于最小判别信息调整的新校准法表现更佳。此外，利用多次施测数据同时进行的 FIPC 校正也改进了新项目的参数估计。然而，由于项目的具体设置，当样本量较小，且初始项目库已通过使用双参数逻辑模型和大量现场试验数据进行了良好校准时，更简单的模型可能无法改善目前的做法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ETS Research Report Series Social Sciences-Education

CiteScore

1.20

自引率

0.00%

发文量