On Bank Assembly and Block Selection in Multidimensional Forced-Choice Adaptive Assessments.

IF 2.1 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement Pub Date : 2023-04-01 Epub Date: 2022-04-28 DOI:10.1177/00131644221087986

Rodrigo S Kreitchmann, Miguel A Sorrel, Francisco J Abad

{"title":"On Bank Assembly and Block Selection in Multidimensional Forced-Choice Adaptive Assessments.","authors":"Rodrigo S Kreitchmann, Miguel A Sorrel, Francisco J Abad","doi":"10.1177/00131644221087986","DOIUrl":null,"url":null,"abstract":"Multidimensional forced-choice (FC) questionnaires have been consistently found to reduce the effects of socially desirable responding and faking in noncognitive assessments. Although FC has been considered problematic for providing ipsative scores under the classical test theory, item response theory (IRT) models enable the estimation of nonipsative scores from FC responses. However, while some authors indicate that blocks composed of opposite-keyed items are necessary to retrieve normative scores, others suggest that these blocks may be less robust to faking, thus impairing the assessment validity. Accordingly, this article presents a simulation study to investigate whether it is possible to retrieve normative scores using only positively keyed items in pairwise FC computerized adaptive testing (CAT). Specifically, a simulation study addressed the effect of (a) different bank assembly (with a randomly assembled bank, an optimally assembled bank, and blocks assembled on-the-fly considering every possible pair of items), and (b) block selection rules (i.e., T, and Bayesian D and A-rules) over the estimate accuracy and ipsativity and overlap rates. Moreover, different questionnaire lengths (30 and 60) and trait structures (independent or positively correlated) were studied, and a nonadaptive questionnaire was included as baseline in each condition. In general, very good trait estimates were retrieved, despite using only positively keyed items. Although the best trait accuracy and lowest ipsativity were found using the Bayesian A-rule with questionnaires assembled on-the-fly, the T-rule under this method led to the worst results. This points out to the importance of considering both aspects when designing FC CAT.","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 2","pages":"294-321"},"PeriodicalIF":2.1000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9972126/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Educational and Psychological Measurement","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1177/00131644221087986","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/4/28 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Multidimensional forced-choice (FC) questionnaires have been consistently found to reduce the effects of socially desirable responding and faking in noncognitive assessments. Although FC has been considered problematic for providing ipsative scores under the classical test theory, item response theory (IRT) models enable the estimation of nonipsative scores from FC responses. However, while some authors indicate that blocks composed of opposite-keyed items are necessary to retrieve normative scores, others suggest that these blocks may be less robust to faking, thus impairing the assessment validity. Accordingly, this article presents a simulation study to investigate whether it is possible to retrieve normative scores using only positively keyed items in pairwise FC computerized adaptive testing (CAT). Specifically, a simulation study addressed the effect of (a) different bank assembly (with a randomly assembled bank, an optimally assembled bank, and blocks assembled on-the-fly considering every possible pair of items), and (b) block selection rules (i.e., T, and Bayesian D and A-rules) over the estimate accuracy and ipsativity and overlap rates. Moreover, different questionnaire lengths (30 and 60) and trait structures (independent or positively correlated) were studied, and a nonadaptive questionnaire was included as baseline in each condition. In general, very good trait estimates were retrieved, despite using only positively keyed items. Although the best trait accuracy and lowest ipsativity were found using the Bayesian A-rule with questionnaires assembled on-the-fly, the T-rule under this method led to the worst results. This points out to the importance of considering both aspects when designing FC CAT.

查看原文本刊更多论文

关于多维强制选择适应性评估中的组库和组块选择。

在非认知测评中，多维强迫选择（FC）问卷一直被认为可以减少社会期望应答和作假的影响。虽然根据经典测验理论，FC 在提供同分分数方面一直被认为存在问题，但项目反应理论（IRT）模型却能从 FC 反应中估算出非同分分数。然而，尽管一些学者指出，由相反关键项目组成的区块对于检索常模分数是必要的，但另一些学者则认为，这些区块对作假的稳健性可能较差，从而损害了测评的有效性。因此，本文介绍了一项模拟研究，以探讨在成对 FC 计算机适应性测试（CAT）中仅使用正向键控项目是否有可能检索到常模分数。具体而言，模拟研究探讨了（a）不同题库组装（随机组装题库、优化组装题库以及考虑每一对可能的项目而即时组装的题块）和（b）题块选择规则（即 T、贝叶斯 D 和 A 规则）对估计准确率、同位率和重叠率的影响。此外，还研究了不同的问卷长度（30 和 60）和特质结构（独立或正相关），并且在每个条件中都包含一份非适应性问卷作为基线。总体而言，尽管只使用了正相关的项目，但仍获得了非常好的特质估计值。虽然使用贝叶斯 A-规则和即时编制的问卷可以获得最佳的特质准确度和最低的同位率，但使用该方法下的 T-规则却能获得最差的结果。这说明在设计 FC CAT 时考虑这两方面因素的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Educational and Psychological Measurement 医学-数学跨学科应用

CiteScore

5.50

自引率

7.40%

发文量

审稿时长

6-12 weeks

期刊介绍： Educational and Psychological Measurement (EPM) publishes referred scholarly work from all academic disciplines interested in the study of measurement theory, problems, and issues. Theoretical articles address new developments and techniques, and applied articles deal with innovation applications.