Practical Considerations in Item Calibration With Small Samples Under Multistage Test Design: A Case Study

Q3 Social Sciences
Hongwen Guo, Matthew S. Johnson, Daniel F. McCaffrey, Lixong Gu
{"title":"Practical Considerations in Item Calibration With Small Samples Under Multistage Test Design: A Case Study","authors":"Hongwen Guo,&nbsp;Matthew S. Johnson,&nbsp;Daniel F. McCaffrey,&nbsp;Lixong Gu","doi":"10.1002/ets2.12376","DOIUrl":null,"url":null,"abstract":"<p>The multistage testing (MST) design has been gaining attention and popularity in educational assessments. For testing programs that have small test-taker samples, it is challenging to calibrate new items to replenish the item pool. In the current research, we used the item pools from an operational MST program to illustrate how research studies can be built upon literature and program-specific data to help to fill the gaps between research and practice and to make sound psychometric decisions to address the small-sample issues. The studies included choice of item calibration methods, data collection designs to increase sample sizes, and item response theory models in producing the score conversion tables. Our results showed that, with small samples, the fixed parameter calibration (FIPC) method performed consistently the best for calibrating new items, compared to the traditional separate-calibration with scaling method and a new approach of a calibration method based on the minimum discriminant information adjustment. In addition, the concurrent FIPC calibration with data from multiple administrations also improved parameter estimation of new items. However, because of the program-specific settings, a simpler model may not improve current practice when the sample size was small and when the initial item pools were well-calibrated using a two-parameter logistic model with a large field trial data.</p>","PeriodicalId":11972,"journal":{"name":"ETS Research Report Series","volume":"2024 1","pages":"1-21"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ets2.12376","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ETS Research Report Series","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ets2.12376","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 0

Abstract

The multistage testing (MST) design has been gaining attention and popularity in educational assessments. For testing programs that have small test-taker samples, it is challenging to calibrate new items to replenish the item pool. In the current research, we used the item pools from an operational MST program to illustrate how research studies can be built upon literature and program-specific data to help to fill the gaps between research and practice and to make sound psychometric decisions to address the small-sample issues. The studies included choice of item calibration methods, data collection designs to increase sample sizes, and item response theory models in producing the score conversion tables. Our results showed that, with small samples, the fixed parameter calibration (FIPC) method performed consistently the best for calibrating new items, compared to the traditional separate-calibration with scaling method and a new approach of a calibration method based on the minimum discriminant information adjustment. In addition, the concurrent FIPC calibration with data from multiple administrations also improved parameter estimation of new items. However, because of the program-specific settings, a simpler model may not improve current practice when the sample size was small and when the initial item pools were well-calibrated using a two-parameter logistic model with a large field trial data.

Abstract Image

多阶段测试设计下小样本项目校准的实际考虑因素:案例研究
多阶段测试(MST)设计在教育评估中越来越受到关注和欢迎。对于考生样本较少的测试项目来说,校准新项目以补充项目库是一项挑战。在目前的研究中,我们使用了一个运行中的 MST 项目的项目库,以说明如何在文献和特定项目数据的基础上开展研究,帮助填补研究与实践之间的空白,并做出合理的心理测量决策,以解决小样本问题。这些研究包括项目校准方法的选择、增加样本量的数据收集设计以及制作分数转换表的项目反应理论模型。研究结果表明,在小样本情况下,固定参数校准法(FIPC)在校准新项目方面一直表现最佳,与之相比,传统的分别校准加比例法和基于最小判别信息调整的新校准法表现更佳。此外,利用多次施测数据同时进行的 FIPC 校正也改进了新项目的参数估计。然而,由于项目的具体设置,当样本量较小,且初始项目库已通过使用双参数逻辑模型和大量现场试验数据进行了良好校准时,更简单的模型可能无法改善目前的做法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ETS Research Report Series
ETS Research Report Series Social Sciences-Education
CiteScore
1.20
自引率
0.00%
发文量
17
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信