Exploring Item Bank Stability through Live and Simulated Datasets

Journal of Language Testing & Assessment Pub Date : 1900-01-01 DOI:10.23977/langta.2022.050102

Tony Lee, David Coniam, M. Milanovic

{"title":"Exploring Item Bank Stability through Live and Simulated Datasets","authors":"Tony Lee, David Coniam, M. Milanovic","doi":"10.23977/langta.2022.050102","DOIUrl":null,"url":null,"abstract":"LanguageCert manages the construction of its tests, exams and assessments using a sophisticated item banking system which contains large amounts of test material that is described, inter alia, in terms of content characteristics such as macroskills, grammatical and lexical features and measurement characteristics such as Rasch difficulty estimates and fit statistics. In order to produce content and difficulty equivalent test forms, it is vital that the items in any LanguageCert bank manifest stable measurement characteristics. The current paper is one of two linked studies exploring the stability of one of the item banks developed by LanguageCert [Note 1]. This particular bank has been used as an adaptive test bank and comprises 820 calibrated items. It has been administered to over 13,000 test takers, each of whom have taken approximately 60 items. The purpose of these two exploratory studies is to examine the stability of this adaptive test item bank from both statistical and operational perspectives. The study compares test taker performance in the live dataset with over 13,000 test takers (where each test taker takes approximately 60 items) with a simulated ‘full’ dataset generated using model-based imputation. Simulation regression lines showed a good match and Rasch fit statistics were also good: thus indicating that items comprising the adaptive item bank are of high quality both in terms of content and statistical stability. Potential future stability was confirmed by results obtained from a Bayesian ANOVA. As mentioned above, such item bank stability is important when item banks are used for multiple purposes, in this context for adaptive testing and the construction of linear tests. The current study therefore lays the ground work for a follow-up study where the utility of this adaptive test item bank is verified by the construction, administration and analysis of a number of linear tests.","PeriodicalId":242888,"journal":{"name":"Journal of Language Testing & Assessment","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Language Testing & Assessment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23977/langta.2022.050102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

LanguageCert manages the construction of its tests, exams and assessments using a sophisticated item banking system which contains large amounts of test material that is described, inter alia, in terms of content characteristics such as macroskills, grammatical and lexical features and measurement characteristics such as Rasch difficulty estimates and fit statistics. In order to produce content and difficulty equivalent test forms, it is vital that the items in any LanguageCert bank manifest stable measurement characteristics. The current paper is one of two linked studies exploring the stability of one of the item banks developed by LanguageCert [Note 1]. This particular bank has been used as an adaptive test bank and comprises 820 calibrated items. It has been administered to over 13,000 test takers, each of whom have taken approximately 60 items. The purpose of these two exploratory studies is to examine the stability of this adaptive test item bank from both statistical and operational perspectives. The study compares test taker performance in the live dataset with over 13,000 test takers (where each test taker takes approximately 60 items) with a simulated ‘full’ dataset generated using model-based imputation. Simulation regression lines showed a good match and Rasch fit statistics were also good: thus indicating that items comprising the adaptive item bank are of high quality both in terms of content and statistical stability. Potential future stability was confirmed by results obtained from a Bayesian ANOVA. As mentioned above, such item bank stability is important when item banks are used for multiple purposes, in this context for adaptive testing and the construction of linear tests. The current study therefore lays the ground work for a follow-up study where the utility of this adaptive test item bank is verified by the construction, administration and analysis of a number of linear tests.

查看原文本刊更多论文

通过实时和模拟数据集探索物项库的稳定性

LanguageCert使用一个复杂的题库系统来管理其测试、考试和评估的构建，该系统包含大量的测试材料，这些材料除其他外，根据内容特征(如宏观技能、语法和词汇特征)和测量特征(如Rasch难度估计和拟合统计)进行描述。为了生成内容和难度等效的测试表格，任何语言测试库中的项目都必须具有稳定的测量特性。本文是探讨LanguageCert开发的一个题库的稳定性的两项相关研究之一[注1]。这个特殊的库被用作自适应测试库，包括820个校准项目。超过13000名考生参加了测试，每个人都参加了大约60项测试。这两项探索性研究的目的是从统计和操作的角度来检验该自适应测试题库的稳定性。该研究比较了超过13,000名考生(每个考生大约参加60个项目)的实时数据集中的考生表现，以及使用基于模型的输入生成的模拟“完整”数据集。模拟回归线显示出良好的匹配和Rasch拟合统计数据也很好:从而表明包含自适应题库的项目在内容和统计稳定性方面都具有高质量。贝叶斯方差分析的结果证实了潜在的未来稳定性。如上所述，当物项库用于多种目的时，这种物项库的稳定性是重要的，在这种环境中，用于自适应测试和线性测试的构建。因此，目前的研究为后续研究奠定了基础，在后续研究中，这种适应性测试题库的效用将通过若干线性测试的构建、管理和分析得到验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Language Testing & Assessment

自引率

0.00%

发文量