剖析多项选择评估中的知识、猜测和失误。

IF 1.1 4区教育学 Q3 EDUCATION & EDUCATIONAL RESEARCH

Applied Measurement in Education Pub Date : 2023-01-01 Epub Date: 2023-02-21 DOI:10.1080/08957347.2023.2172017

Rashid M Abu-Ghazalah, David N Dubins, Gregory M K Poon

{"title":"剖析多项选择评估中的知识、猜测和失误。","authors":"Rashid M Abu-Ghazalah, David N Dubins, Gregory M K Poon","doi":"10.1080/08957347.2023.2172017","DOIUrl":null,"url":null,"abstract":"Multiple choice results are inherently probabilistic outcomes, as correct responses reflect a combination of knowledge and guessing, while incorrect responses additionally reflect blunder, a confidently committed mistake. To objectively resolve knowledge from responses in an MC test structure, we evaluated probabilistic models that explicitly account for guessing, knowledge and blunder using eight assessments (>9,000 responses) from an undergraduate biotechnology curriculum. A Bayesian implementation of the models, aimed at assessing their robustness to prior beliefs in examinee knowledge, showed that explicit estimators of knowledge are markedly sensitive to prior beliefs with scores as sole input. To overcome this limitation, we examined self-ranked confidence as a proxy knowledge indicator. For our test set, three levels of confidence resolved test performance. Responses rated as least confident were correct more frequently than expected from random selection, reflecting partial knowledge, but were balanced by blunder among the most confident responses. By translating evidence-based guessing and blunder rates to pass marks that statistically qualify a desired level of examinee knowledge, our approach finds practical utility in test analysis and design.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"36 1","pages":"80-98"},"PeriodicalIF":1.1000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10201919/pdf/","citationCount":"0","resultStr":"{\"title\":\"Dissecting knowledge, guessing, and blunder in multiple choice assessments.\",\"authors\":\"Rashid M Abu-Ghazalah, David N Dubins, Gregory M K Poon\",\"doi\":\"10.1080/08957347.2023.2172017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multiple choice results are inherently probabilistic outcomes, as correct responses reflect a combination of knowledge and guessing, while incorrect responses additionally reflect blunder, a confidently committed mistake. To objectively resolve knowledge from responses in an MC test structure, we evaluated probabilistic models that explicitly account for guessing, knowledge and blunder using eight assessments (>9,000 responses) from an undergraduate biotechnology curriculum. A Bayesian implementation of the models, aimed at assessing their robustness to prior beliefs in examinee knowledge, showed that explicit estimators of knowledge are markedly sensitive to prior beliefs with scores as sole input. To overcome this limitation, we examined self-ranked confidence as a proxy knowledge indicator. For our test set, three levels of confidence resolved test performance. Responses rated as least confident were correct more frequently than expected from random selection, reflecting partial knowledge, but were balanced by blunder among the most confident responses. By translating evidence-based guessing and blunder rates to pass marks that statistically qualify a desired level of examinee knowledge, our approach finds practical utility in test analysis and design.\",\"PeriodicalId\":51609,\"journal\":{\"name\":\"Applied Measurement in Education\",\"volume\":\"36 1\",\"pages\":\"80-98\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10201919/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Measurement in Education\",\"FirstCategoryId\":\"95\",\"ListUrlMain\":\"https://doi.org/10.1080/08957347.2023.2172017\",\"RegionNum\":4,\"RegionCategory\":\"教育学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/2/21 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Measurement in Education","FirstCategoryId":"95","ListUrlMain":"https://doi.org/10.1080/08957347.2023.2172017","RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/2/21 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

摘要

多项选择的结果本质上是概率结果，因为正确答案反映了知识和猜测的结合，而错误答案则额外反映了失误，即自信犯下的错误。为了客观地从多选题测试结构中的答案中分辨出知识点，我们使用一个本科生物技术课程的八个评估（超过 9,000 个答案）对明确考虑了猜测、知识点和失误的概率模型进行了评估。模型的贝叶斯实施旨在评估其对考生知识的先验信念的稳健性，结果表明，以分数为唯一输入的知识显式估算器对先验信念非常敏感。为了克服这一局限性，我们将自我排序的信心作为知识的替代指标。在我们的测试集中，三个信心等级决定了测试成绩。被评为最不自信的回答的正确率高于随机选择的预期正确率，这反映了部分知识，但在最自信的回答中被失误所平衡。通过将基于证据的猜测率和失误率转化为及格分数，在统计学上对考生所需的知识水平进行鉴定，我们的方法在测试分析和设计中具有实用价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dissecting knowledge, guessing, and blunder in multiple choice assessments.

Multiple choice results are inherently probabilistic outcomes, as correct responses reflect a combination of knowledge and guessing, while incorrect responses additionally reflect blunder, a confidently committed mistake. To objectively resolve knowledge from responses in an MC test structure, we evaluated probabilistic models that explicitly account for guessing, knowledge and blunder using eight assessments (>9,000 responses) from an undergraduate biotechnology curriculum. A Bayesian implementation of the models, aimed at assessing their robustness to prior beliefs in examinee knowledge, showed that explicit estimators of knowledge are markedly sensitive to prior beliefs with scores as sole input. To overcome this limitation, we examined self-ranked confidence as a proxy knowledge indicator. For our test set, three levels of confidence resolved test performance. Responses rated as least confident were correct more frequently than expected from random selection, reflecting partial knowledge, but were balanced by blunder among the most confident responses. By translating evidence-based guessing and blunder rates to pass marks that statistically qualify a desired level of examinee knowledge, our approach finds practical utility in test analysis and design.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Measurement in Education Multiple-

CiteScore

2.50

自引率

13.30%

发文量

期刊介绍： Because interaction between the domains of research and application is critical to the evaluation and improvement of new educational measurement practices, Applied Measurement in Education" prime objective is to improve communication between academicians and practitioners. To help bridge the gap between theory and practice, articles in this journal describe original research studies, innovative strategies for solving educational measurement problems, and integrative reviews of current approaches to contemporary measurement issues. Peer Review Policy: All review papers in this journal have undergone editorial screening and peer review.