Applied Psychological Measurement最新文献

筛选
英文 中文
Are Large-Scale Test Scores Comparable for At-Home Versus Test Center Testing? 家庭测试与测试中心测试的大规模测试成绩是否相当?
IF 1.2 4区 心理学
Applied Psychological Measurement Pub Date : 2024-05-11 DOI: 10.1177/01466216241253795
Katherine E. Castellano, Matthew S. Johnson, Rene Lawless
{"title":"Are Large-Scale Test Scores Comparable for At-Home Versus Test Center Testing?","authors":"Katherine E. Castellano, Matthew S. Johnson, Rene Lawless","doi":"10.1177/01466216241253795","DOIUrl":"https://doi.org/10.1177/01466216241253795","url":null,"abstract":"The COVID-19 pandemic led to a proliferation of remote-proctored (or “at-home”) assessments. The lack of standardized setting, device, or in-person proctor during at-home testing makes it markedly distinct from testing at a test center. Comparability studies of at-home and test center scores are important in understanding whether these distinctions impact test scores. This study found no significant differences in at-home versus test center test scores on a large-scale admissions test using either a randomized controlled trial or an observational study after adjusting for differences in sample composition along baseline characteristics.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140989974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Test Security and the Pandemic: Comparison of Test Center and Online Proctor Delivery Modalities 考试安全与大流行病:考试中心与在线监考模式的比较
IF 1.2 4区 心理学
Applied Psychological Measurement Pub Date : 2024-04-23 DOI: 10.1177/01466216241248826
Kirk A. Becker, Jinghua Liu, Paul E. Jones
{"title":"Test Security and the Pandemic: Comparison of Test Center and Online Proctor Delivery Modalities","authors":"Kirk A. Becker, Jinghua Liu, Paul E. Jones","doi":"10.1177/01466216241248826","DOIUrl":"https://doi.org/10.1177/01466216241248826","url":null,"abstract":"Published information is limited regarding the security of testing programs, and even less on the relative security of different testing modalities: in-person at test centers (TC) versus remote online proctored (OP) testing. This article begins by examining indicators of test security violations across a wide range of programs in professional, admissions, and IT fields. We look at high levels of response overlap as a potential indicator of collusion to cheat on the exam and compare rates by modality and between test center types. Next, we scrutinize indicators of potential test security violations for a single large testing program over the course of 14 months, during which the program went from exclusively in-person TC testing to a mix of OP and TC testing. Test security indicators include high response overlap, large numbers of fast correct responses, large numbers of slow correct responses, large test-retest score gains, unusually fast response times for passing candidates, and measures of differential person functioning. These indicators are examined and compared prior to and after the introduction of OP testing. In addition, test-retest modality is examined for candidates who fail and retest subsequent to the introduction of OP testing, with special attention paid to test takers who change modality between the initial attempt and the retest. These data allow us to understand whether indications of content exposure increase with the introduction of OP testing, and whether testing modalities affect potential score increase in a similar way.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140667252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How Scoring Approaches Impact Estimates of Growth in the Presence of Survey Item Ceiling Effects 在存在调查项目上限效应的情况下,计分方法如何影响增长估计值
IF 1.2 4区 心理学
Applied Psychological Measurement Pub Date : 2024-03-16 DOI: 10.1177/01466216241238749
Kelly D. Edwards, J. Soland
{"title":"How Scoring Approaches Impact Estimates of Growth in the Presence of Survey Item Ceiling Effects","authors":"Kelly D. Edwards, J. Soland","doi":"10.1177/01466216241238749","DOIUrl":"https://doi.org/10.1177/01466216241238749","url":null,"abstract":"Survey scores are often the basis for understanding how individuals grow psychologically and socio-emotionally. A known problem with many surveys is that the items are all “easy”—that is, individuals tend to use only the top one or two response categories on the Likert scale. Such an issue could be especially problematic, and lead to ceiling effects, when the same survey is administered repeatedly over time. In this study, we conduct simulation and empirical studies to (a) quantify the impact of these ceiling effects on growth estimates when using typical scoring approaches like sum scores and unidimensional item response theory (IRT) models and (b) examine whether approaches to survey design and scoring, including employing various longitudinal multidimensional IRT (MIRT) models, can mitigate any bias in growth estimates. We show that bias is substantial when using typical scoring approaches and that, while lengthening the survey helps somewhat, using a longitudinal MIRT model with plausible values scoring all but alleviates the issue. Results have implications for scoring surveys in growth studies going forward, as well as understanding how Likert item ceiling effects may be contributing to replication failures.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140236784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating the Douglas-Cohen IRT Goodness of Fit Measure With BIB Sampling of Items 利用 BIB 项目抽样评估道格拉斯-科恩 IRT 拟合度量法
IF 1.2 4区 心理学
Applied Psychological Measurement Pub Date : 2024-03-14 DOI: 10.1177/01466216241238740
John R. Donoghue, Adrienne N. Sgammato
{"title":"Evaluating the Douglas-Cohen IRT Goodness of Fit Measure With BIB Sampling of Items","authors":"John R. Donoghue, Adrienne N. Sgammato","doi":"10.1177/01466216241238740","DOIUrl":"https://doi.org/10.1177/01466216241238740","url":null,"abstract":"Methods to detect item response theory (IRT) item-level misfit are typically derived assuming fixed test forms. However, IRT is also employed with more complicated test designs, such as the balanced incomplete block (BIB) design used in large-scale educational assessments. This study investigates two modifications of Douglas and Cohen’s 2001 nonparametric method of assessing item misfit, based on A) using block total score and B) pooling booklet level scores for analyzing BIB data. Block-level scores showed extreme inflation of Type I error for short blocks containing 5 or 10 items. The pooled booklet method yielded Type I error rates close to nominal [Formula: see text] in most conditions and had power to detect misfitting items. The study also found that the Douglas and Cohen procedure is only slightly affected by the presence of other misfitting items in the block. The pooled booklet method is recommended for practical applications of Douglas and Cohen’s method with BIB data.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140243145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Differential Item Functioning in Multidimensional Graded Response Models With Recursive Partitioning 用递归分区检测多维分级反应模型中的差异项目功能
IF 1.2 4区 心理学
Applied Psychological Measurement Pub Date : 2024-03-13 DOI: 10.1177/01466216241238743
Franz Classe, Christoph Kern
{"title":"Detecting Differential Item Functioning in Multidimensional Graded Response Models With Recursive Partitioning","authors":"Franz Classe, Christoph Kern","doi":"10.1177/01466216241238743","DOIUrl":"https://doi.org/10.1177/01466216241238743","url":null,"abstract":"Differential item functioning (DIF) is a common challenge when examining latent traits in large scale surveys. In recent work, methods from the field of machine learning such as model-based recursive partitioning have been proposed to identify subgroups with DIF when little theoretical guidance and many potential subgroups are available. On this basis, we propose and compare recursive partitioning techniques for detecting DIF with a focus on measurement models with multiple latent variables and ordinal response data. We implement tree-based approaches for identifying subgroups that contribute to DIF in multidimensional latent variable modeling and propose a robust, yet scalable extension, inspired by random forests. The proposed techniques are applied and compared with simulations. We show that the proposed methods are able to efficiently detect DIF and allow to extract decision rules that lead to subgroups with well fitting models.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140247720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linking Methods for Multidimensional Forced Choice Tests Using the Multi-Unidimensional Pairwise Preference Model 使用多维成对偏好模型的多维强制选择测试的链接方法
IF 1.2 4区 心理学
Applied Psychological Measurement Pub Date : 2024-03-11 DOI: 10.1177/01466216241238741
Naidan Tu, Lavanya S. Kumar, Sean Joo, Stephen Stark
{"title":"Linking Methods for Multidimensional Forced Choice Tests Using the Multi-Unidimensional Pairwise Preference Model","authors":"Naidan Tu, Lavanya S. Kumar, Sean Joo, Stephen Stark","doi":"10.1177/01466216241238741","DOIUrl":"https://doi.org/10.1177/01466216241238741","url":null,"abstract":"Applications of multidimensional forced choice (MFC) testing have increased considerably over the last 20 years. Yet there has been little, if any, research on methods for linking the parameter estimates from different samples. This research addressed that important need by extending four widely used methods for unidimensional linking and comparing the efficacy of new estimation algorithms for MFC linking coefficients based on the Multi-Unidimensional Pairwise Preference model (MUPP). More specifically, we compared the efficacy of multidimensional test characteristic curve (TCC), item characteristic curve (ICC; Haebara, 1980), mean/mean (M/M), and mean/sigma (M/S) methods in a Monte Carlo study that also manipulated test length, test dimensionality, sample size, percentage of anchor items, and linking scenarios. Results indicated that the ICC method outperformed the M/M method, which was better than the M/S method, with the TCC method being the least effective. However, as the number of items “per dimension” and the percentage of anchor items increased, the differences between the ICC, M/M, and M/S methods decreased. Study implications and practical recommendations for MUPP linking, as well as limitations, are discussed.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140253400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Interpretable Machine Learning for Differential Item Functioning Detection in Psychometric Tests 使用可解释机器学习检测心理测验中的差异项目功能
IF 1.2 4区 心理学
Applied Psychological Measurement Pub Date : 2024-03-11 DOI: 10.1177/01466216241238744
E. Kraus, Johannes Wild, Sven Hilbert
{"title":"Using Interpretable Machine Learning for Differential Item Functioning Detection in Psychometric Tests","authors":"E. Kraus, Johannes Wild, Sven Hilbert","doi":"10.1177/01466216241238744","DOIUrl":"https://doi.org/10.1177/01466216241238744","url":null,"abstract":"This study presents a novel method to investigate test fairness and differential item functioning combining psychometrics and machine learning. Test unfairness manifests itself in systematic and demographically imbalanced influences of confounding constructs on residual variances in psychometric modeling. Our method aims to account for resulting complex relationships between response patterns and demographic attributes. Specifically, it measures the importance of individual test items, and latent ability scores in comparison to a random baseline variable when predicting demographic characteristics. We conducted a simulation study to examine the functionality of our method under various conditions such as linear and complex impact, unfairness and varying number of factors, unfair items, and varying test length. We found that our method detects unfair items as reliably as Mantel–Haenszel statistics or logistic regression analyses but generalizes to multidimensional scales in a straight forward manner. To apply the method, we used random forests to predict migration backgrounds from ability scores and single items of an elementary school reading comprehension test. One item was found to be unfair according to all proposed decision criteria. Further analysis of the item’s content provided plausible explanations for this finding. Analysis code is available at: https://osf.io/s57rw/?view_only=47a3564028d64758982730c6d9c6c547 .","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140253015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benefits of the Curious Behavior of Bayesian Hierarchical Item Response Theory Models—An in-Depth Investigation and Bias Correction 贝叶斯分层项目反应理论模型奇异行为的益处--深入调查与偏差校正
IF 1.2 4区 心理学
Applied Psychological Measurement Pub Date : 2024-01-20 DOI: 10.1177/01466216241227547
Christoph König, Rainer W. Alexandrowicz
{"title":"Benefits of the Curious Behavior of Bayesian Hierarchical Item Response Theory Models—An in-Depth Investigation and Bias Correction","authors":"Christoph König, Rainer W. Alexandrowicz","doi":"10.1177/01466216241227547","DOIUrl":"https://doi.org/10.1177/01466216241227547","url":null,"abstract":"When using Bayesian hierarchical modeling, a popular approach for Item Response Theory (IRT) models, researchers typically face a tradeoff between the precision and accuracy of the item parameter estimates. Given the pooling principle and variance-dependent shrinkage, the expected behavior of Bayesian hierarchical IRT models is to deliver more precise but biased item parameter estimates, compared to those obtained in nonhierarchical models. Previous research, however, points out the possibility that, in the context of the two-parameter logistic IRT model, the aforementioned tradeoff has not to be made. With a comprehensive simulation study, we provide an in-depth investigation into this possibility. The results show a superior performance, in terms of bias, RMSE and precision, of the hierarchical specifications compared to the nonhierarchical counterpart. Under certain conditions, the bias in the item parameter estimates is independent of the bias in the variance components. Moreover, we provide a bias correction procedure for item discrimination parameter estimates. In sum, we show that IRT models create a unique situation where the Bayesian hierarchical approach indeed yields parameter estimates that are not only more precise, but also more accurate, compared to nonhierarchical approaches. We discuss this beneficial behavior from both theoretical and applied point of views.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139523601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to “irtplay: An R Package for Online Item Calibration, Scoring, Evaluation of Model Fit, and Useful Functions for Unidimensional IRT” irtplay:用于单维 IRT 的在线项目校准、评分、模型拟合度评估和有用函数的 R 包"
IF 1.2 4区 心理学
Applied Psychological Measurement Pub Date : 2024-01-18 DOI: 10.1177/01466216231223043
{"title":"Corrigendum to “irtplay: An R Package for Online Item Calibration, Scoring, Evaluation of Model Fit, and Useful Functions for Unidimensional IRT”","authors":"","doi":"10.1177/01466216231223043","DOIUrl":"https://doi.org/10.1177/01466216231223043","url":null,"abstract":"","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139614768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting uniform differential item functioning for continuous response computerized adaptive testing 检测连续反应计算机自适应测试的统一差异项目功能
IF 1.2 4区 心理学
Applied Psychological Measurement Pub Date : 2024-01-17 DOI: 10.1177/01466216241227544
Chun Wang, Ruoyi Zhu
{"title":"Detecting uniform differential item functioning for continuous response computerized adaptive testing","authors":"Chun Wang, Ruoyi Zhu","doi":"10.1177/01466216241227544","DOIUrl":"https://doi.org/10.1177/01466216241227544","url":null,"abstract":"Evaluating items for potential differential item functioning (DIF) is an essential step to ensuring measurement fairness. In this article, we focus on a specific scenario, namely, the continuous response, severely sparse, computerized adaptive testing (CAT). Continuous responses items are growingly used in performance-based tasks because they tend to generate more information than traditional dichotomous items. Severe sparsity arises when many items are automatically generated via machine learning algorithms. We propose two uniform DIF detection methods in this scenario. The first is a modified version of the CAT-SIBTEST, a non-parametric method that does not depend on any specific item response theory model assumptions. The second is a regularization method, a parametric, model-based approach. Simulation studies show that both methods are effective in correctly identifying items with uniform DIF. A real data analysis is provided in the end to illustrate the utility and potential caveats of the two methods.","PeriodicalId":48300,"journal":{"name":"Applied Psychological Measurement","volume":null,"pages":null},"PeriodicalIF":1.2,"publicationDate":"2024-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139616965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信