计算机自适应分类测试中分类精度、测试长度和测量精度的研究

IF 0.6 Q4 PSYCHOLOGY, EDUCATIONAL

Journal of Measurement and Evaluation in Education and Psychology-EPOD Pub Date : 2021-02-21 DOI:10.21031/EPOD.787865

Sedanur Demir, B. Atar

{"title":"计算机自适应分类测试中分类精度、测试长度和测量精度的研究","authors":"Sedanur Demir, B. Atar","doi":"10.21031/EPOD.787865","DOIUrl":null,"url":null,"abstract":"This study aims to compare Sequential Probability Ratio Test (SPRT) and Confidence Interval (CI) classification criteria, Maximum Fisher Information method on the basis of estimated-ability (MFI-EB) and Cut-Point (MFI-CB) item selection methods while ability estimation method is Weighted Likelihood Estimation (WLE) in Computerized Adaptive Classification Testing (CACT), according to the Average Classification Accuracy (ACA), Average Test Length (ATL), and measurement precision under content balancing (Constrained Computerized Adaptive Testing: CCAT and Modified Multinomial Model: MMM) and item exposure control (Sympson-Hetter Method: SH and Item Eligibility Method: IE) when the classification is done based on two, three, or four categories for a unidimensional pool of dichotomous items. Forty-eight conditions are created in Monte Carlo (MC) simulation for the data, generated in R software, including 500 items and 5000 examinees, and the results are calculated over 30 replications. As a result of the study, it was observed that CI performs better in terms of ATL, and SPRT performs better in ACA and correlation, bias, Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE) values, sequentially; MFI-EB is more useful than MFI-CB. It was also seen that MMM is more successful in content balancing, whereas CCAT is better in terms of test efficiency (ATL and ACA), and IE is superior in terms of item exposure control though SH is more beneficial in test efficiency. Besides, increasing the number of classification categories increases ATL but decreases ACA, and it gives better results in terms of the correlation, bias, RMSE, and MAE values.","PeriodicalId":43015,"journal":{"name":"Journal of Measurement and Evaluation in Education and Psychology-EPOD","volume":"12 1","pages":"15-27"},"PeriodicalIF":0.6000,"publicationDate":"2021-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Investigation of Classification Accuracy, Test Length and Measurement Precision at Computerized Adaptive Classification Tests\",\"authors\":\"Sedanur Demir, B. Atar\",\"doi\":\"10.21031/EPOD.787865\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study aims to compare Sequential Probability Ratio Test (SPRT) and Confidence Interval (CI) classification criteria, Maximum Fisher Information method on the basis of estimated-ability (MFI-EB) and Cut-Point (MFI-CB) item selection methods while ability estimation method is Weighted Likelihood Estimation (WLE) in Computerized Adaptive Classification Testing (CACT), according to the Average Classification Accuracy (ACA), Average Test Length (ATL), and measurement precision under content balancing (Constrained Computerized Adaptive Testing: CCAT and Modified Multinomial Model: MMM) and item exposure control (Sympson-Hetter Method: SH and Item Eligibility Method: IE) when the classification is done based on two, three, or four categories for a unidimensional pool of dichotomous items. Forty-eight conditions are created in Monte Carlo (MC) simulation for the data, generated in R software, including 500 items and 5000 examinees, and the results are calculated over 30 replications. As a result of the study, it was observed that CI performs better in terms of ATL, and SPRT performs better in ACA and correlation, bias, Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE) values, sequentially; MFI-EB is more useful than MFI-CB. It was also seen that MMM is more successful in content balancing, whereas CCAT is better in terms of test efficiency (ATL and ACA), and IE is superior in terms of item exposure control though SH is more beneficial in test efficiency. Besides, increasing the number of classification categories increases ATL but decreases ACA, and it gives better results in terms of the correlation, bias, RMSE, and MAE values.\",\"PeriodicalId\":43015,\"journal\":{\"name\":\"Journal of Measurement and Evaluation in Education and Psychology-EPOD\",\"volume\":\"12 1\",\"pages\":\"15-27\"},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2021-02-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Measurement and Evaluation in Education and Psychology-EPOD\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21031/EPOD.787865\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"PSYCHOLOGY, EDUCATIONAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Measurement and Evaluation in Education and Psychology-EPOD","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21031/EPOD.787865","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"PSYCHOLOGY, EDUCATIONAL","Score":null,"Total":0}

引用次数: 0

摘要

本研究旨在比较计算机自适应分类测试(CACT)中能力估计方法为加权似然估计(WLE)的顺序概率比检验(SPRT)和置信区间(CI)分类标准、基于估计能力的最大Fisher信息法(MFI-EB)和Cut-Point (MFI-CB)项目选择方法，根据平均分类准确率(ACA)、平均测试长度(ATL)、在内容平衡(约束计算机自适应测试:CCAT和修正多项模型:MMM)和项目暴露控制(simpson - hetter方法:SH和项目合格性方法:IE)下，当分类基于两个、三个或四个类别对一维二分类项目进行分类时，测量精度。在蒙特卡罗(MC)模拟中，对R软件生成的数据创建了48个条件，包括500个项目和5000名考生，并对结果进行了30多次重复计算。研究结果显示，CI在ATL方面表现较好，SPRT在ACA和相关性、偏倚、均方根误差(RMSE)和平均绝对误差(MAE)值方面表现较好;MFI-EB比MFI-CB更有用。MMM在内容平衡方面更成功，而CCAT在测试效率(ATL和ACA)方面更好，IE在项目暴露控制方面更优越，而SH在测试效率方面更有利。此外，增加分类类别的数量会增加ATL而降低ACA，并且在相关性、偏倚、RMSE和MAE值方面都能得到更好的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Investigation of Classification Accuracy, Test Length and Measurement Precision at Computerized Adaptive Classification Tests

This study aims to compare Sequential Probability Ratio Test (SPRT) and Confidence Interval (CI) classification criteria, Maximum Fisher Information method on the basis of estimated-ability (MFI-EB) and Cut-Point (MFI-CB) item selection methods while ability estimation method is Weighted Likelihood Estimation (WLE) in Computerized Adaptive Classification Testing (CACT), according to the Average Classification Accuracy (ACA), Average Test Length (ATL), and measurement precision under content balancing (Constrained Computerized Adaptive Testing: CCAT and Modified Multinomial Model: MMM) and item exposure control (Sympson-Hetter Method: SH and Item Eligibility Method: IE) when the classification is done based on two, three, or four categories for a unidimensional pool of dichotomous items. Forty-eight conditions are created in Monte Carlo (MC) simulation for the data, generated in R software, including 500 items and 5000 examinees, and the results are calculated over 30 replications. As a result of the study, it was observed that CI performs better in terms of ATL, and SPRT performs better in ACA and correlation, bias, Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE) values, sequentially; MFI-EB is more useful than MFI-CB. It was also seen that MMM is more successful in content balancing, whereas CCAT is better in terms of test efficiency (ATL and ACA), and IE is superior in terms of item exposure control though SH is more beneficial in test efficiency. Besides, increasing the number of classification categories increases ATL but decreases ACA, and it gives better results in terms of the correlation, bias, RMSE, and MAE values.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Measurement and Evaluation in Education and Psychology-EPOD PSYCHOLOGY, EDUCATIONAL-

CiteScore

0.70

自引率

20.00%

发文量

审稿时长

10 weeks