Do Subject Matter Experts’ Judgments of Multiple-Choice Format Suitability Predict Item Quality?

IF 2.7 4区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Educational Measurement-Issues and Practice Pub Date : 2023-07-11 DOI:10.1111/emip.12570

Rebecca F. Berenbon, Bridget C. McHugh

{"title":"Do Subject Matter Experts’ Judgments of Multiple-Choice Format Suitability Predict Item Quality?","authors":"Rebecca F. Berenbon, Bridget C. McHugh","doi":"10.1111/emip.12570","DOIUrl":null,"url":null,"abstract":"<p>To assemble a high-quality test, psychometricians rely on subject matter experts (SMEs) to write high-quality items. However, SMEs are not typically given the opportunity to provide input on which content standards are most suitable for multiple-choice questions (MCQs). In the present study, we explored the relationship between perceived MCQ suitability for a given content standard and the associated item characteristics. Prior to item writing, we surveyed SMEs on MCQ suitability for each content standard. Following field testing, we then used SMEs’ average ratings for each content standard to predict item characteristics for the tests. We analyzed multilevel models predicting item difficulty (<i>p</i> value), discrimination, and nonfunctioning distractor presence. Items were nested within courses and content standards. There was a curvilinear relationship between SMEs’ ratings and item difficulty such that very low MCQ suitability ratings were predictive of easier items. After controlling for item difficulty, items with higher MCQ suitability ratings had higher discrimination and were less likely to have one or more nonfunctioning distractors. This research has practical implications for optimizing test blueprints. Additionally, psychometricians may use these ratings to better prepare for coaching SMEs during item writing.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 3","pages":"13-21"},"PeriodicalIF":2.7000,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12570","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Educational Measurement-Issues and Practice","FirstCategoryId":"95","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/emip.12570","RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

Abstract

To assemble a high-quality test, psychometricians rely on subject matter experts (SMEs) to write high-quality items. However, SMEs are not typically given the opportunity to provide input on which content standards are most suitable for multiple-choice questions (MCQs). In the present study, we explored the relationship between perceived MCQ suitability for a given content standard and the associated item characteristics. Prior to item writing, we surveyed SMEs on MCQ suitability for each content standard. Following field testing, we then used SMEs’ average ratings for each content standard to predict item characteristics for the tests. We analyzed multilevel models predicting item difficulty (p value), discrimination, and nonfunctioning distractor presence. Items were nested within courses and content standards. There was a curvilinear relationship between SMEs’ ratings and item difficulty such that very low MCQ suitability ratings were predictive of easier items. After controlling for item difficulty, items with higher MCQ suitability ratings had higher discrimination and were less likely to have one or more nonfunctioning distractors. This research has practical implications for optimizing test blueprints. Additionally, psychometricians may use these ratings to better prepare for coaching SMEs during item writing.

Abstract Image

查看原文本刊更多论文

主题专家对多选格式适用性的判断能预测项目质量吗？

为了编写高质量的测试，心理测量学家依靠主题专家(sme)来编写高质量的项目。然而，中小企业通常没有机会就最适合选择题的内容标准提供意见。在本研究中，我们探讨了感知MCQ对给定内容标准的适应性与相关项目特征之间的关系。在撰写项目之前，我们调查了中小企业对每个内容标准的MCQ适用性。在现场测试之后，我们使用中小企业对每个内容标准的平均评分来预测测试的项目特征。我们分析了预测项目难度(p值)、歧视和无功能干扰物存在的多级模型。项被嵌套在课程和内容标准中。中小企业的评级与项目难度之间存在曲线关系，因此非常低的MCQ适宜性评级可以预测较容易的项目。在控制了项目难度后，MCQ适宜性评级较高的项目具有更高的歧视，并且不太可能有一个或多个不起作用的干扰物。本研究对优化测试蓝图具有实际意义。此外，心理测量学家可以使用这些评分来更好地准备在项目写作期间指导中小企业。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Educational Measurement-Issues and Practice Multiple-

CiteScore

3.90

自引率

15.00%

发文量