Educational and Psychological Measurement最新文献

筛选
英文 中文
The One-Parameter Logistic Model Can Be True With Zero Probability for a Unidimensional Measuring Instrument: How One Could Go Wrong Removing Items Not Satisfying the Model. 一维测量仪器的单参数Logistic模型可以零概率成立:移除不符合模型的项目如何出错?
IF 2.3 3区 心理学
Educational and Psychological Measurement Pub Date : 2025-08-06 DOI: 10.1177/00131644251345120
Tenko Raykov, Bingsheng Zhang
{"title":"The One-Parameter Logistic Model Can Be True With Zero Probability for a Unidimensional Measuring Instrument: How One Could Go Wrong Removing Items Not Satisfying the Model.","authors":"Tenko Raykov, Bingsheng Zhang","doi":"10.1177/00131644251345120","DOIUrl":"10.1177/00131644251345120","url":null,"abstract":"<p><p>This note is concerned with the chance of the one-parameter logistic (1PL-) model or the Rasch model being true for a unidimensional multi-item measuring instrument. It is pointed out that if a single dimension underlies a scale consisting of dichotomous items, then the probability of either model being correct for that scale can be zero. The question is then addressed, what the consequences could be of removing items not following these models. Using a large number of simulated data sets, a pair of empirically relevant settings is presented where such item elimination can be problematic. Specifically, dropping items from a unidimensional instrument due to them not satisfying the 1PL-model, or the Rasch model, can yield potentially seriously misleading ability estimates with increased standard errors and prediction error with respect to the latent trait. Implications for educational and behavioral research are discussed.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251345120"},"PeriodicalIF":2.3,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12328337/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144816062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model-Based Person Fit Statistics Applied to the Wechsler Adult Intelligence Scale IV. 基于模型的人拟合统计在韦氏成人智力量表中的应用
IF 2.3 3区 心理学
Educational and Psychological Measurement Pub Date : 2025-08-03 DOI: 10.1177/00131644251339444
Jared M Block, Steven P Reise, Keith F Widaman, Amanda K Montoya, David W Loring, Laura Glass Umfleet, Russell M Bauer, Joseph M Gullett, Brittany Wolff, Daniel L Drane, Kristen Enriquez, Robert M Bilder
{"title":"Model-Based Person Fit Statistics Applied to the Wechsler Adult Intelligence Scale IV.","authors":"Jared M Block, Steven P Reise, Keith F Widaman, Amanda K Montoya, David W Loring, Laura Glass Umfleet, Russell M Bauer, Joseph M Gullett, Brittany Wolff, Daniel L Drane, Kristen Enriquez, Robert M Bilder","doi":"10.1177/00131644251339444","DOIUrl":"10.1177/00131644251339444","url":null,"abstract":"<p><p>An important task in clinical neuropsychology is to evaluate whether scores obtained on a test battery, such as the Wechsler Adult Intelligence Scale Fourth Edition (WAIS-IV), can be considered \"credible\" or \"valid\" for a particular patient. Such evaluations are typically made based on responses to performance validity tests (PVTs). As a complement to PVTs, we propose that WAIS-IV profiles also be evaluated using a residual-based M-distance ( <math> <mrow> <msubsup><mrow><mi>d</mi></mrow> <mrow><mi>ri</mi></mrow> <mrow><mn>2</mn></mrow> </msubsup> </mrow> </math> ) person fit statistic. Large <math> <mrow> <msubsup><mrow><mi>d</mi></mrow> <mrow><mi>ri</mi></mrow> <mrow><mn>2</mn></mrow> </msubsup> </mrow> </math> values flag profiles that are inconsistent with the factor analytic model underlying the interpretation of test scores. We first established a well-fitting model with four correlated factors for 10 core WAIS-IV subtests derived from the standardization sample. Based on this model, we then performed a Monte Carlo simulation to evaluate whether a hypothesized sampling distribution for <math> <mrow> <msubsup><mrow><mi>d</mi></mrow> <mrow><mi>ri</mi></mrow> <mrow><mn>2</mn></mrow> </msubsup> </mrow> </math> was accurate and whether <math> <mrow> <msubsup><mrow><mi>d</mi></mrow> <mrow><mi>ri</mi></mrow> <mrow><mn>2</mn></mrow> </msubsup> </mrow> </math> was computable, under different degrees of missing subtest scores. We found that when the number of subtests administered was less than 8, <math> <mrow> <msubsup><mrow><mi>d</mi></mrow> <mrow><mi>ri</mi></mrow> <mrow><mn>2</mn></mrow> </msubsup> </mrow> </math> could not be computed around 25% of the time. When computable, <math> <mrow> <msubsup><mrow><mi>d</mi></mrow> <mrow><mi>ri</mi></mrow> <mrow><mn>2</mn></mrow> </msubsup> </mrow> </math> conformed to a <math> <mrow> <msup><mrow><mi>χ</mi></mrow> <mrow><mn>2</mn></mrow> </msup> </mrow> </math> distribution with degrees of freedom equal to the number of tests minus the number of factors. Demonstration of the <math> <mrow> <msubsup><mrow><mi>d</mi></mrow> <mrow><mi>ri</mi></mrow> <mrow><mn>2</mn></mrow> </msubsup> </mrow> </math> index in a large sample of clinical cases was also provided. Findings highlight the potential utility of the <math> <mrow> <msubsup><mrow><mi>d</mi></mrow> <mrow><mi>ri</mi></mrow> <mrow><mn>2</mn></mrow> </msubsup> </mrow> </math> index as an adjunct to PVTs, offering clinicians an additional method to evaluate WAIS-IV test profiles and improve the accuracy of neuropsychological evaluations.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251339444"},"PeriodicalIF":2.3,"publicationDate":"2025-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12321812/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144793789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Disentangling Qualitatively Different Faking Strategies in High-Stakes Personality Assessments: A Mixture Extension of the Multidimensional Nominal Response Model. 高风险人格评估中不同品质伪装策略的解耦:多维名义反应模型的混合扩展。
IF 2.3 3区 心理学
Educational and Psychological Measurement Pub Date : 2025-07-29 DOI: 10.1177/00131644251341843
Timo Seitz, Ö Emre C Alagöz, Thorsten Meiser
{"title":"Disentangling Qualitatively Different Faking Strategies in High-Stakes Personality Assessments: A Mixture Extension of the Multidimensional Nominal Response Model.","authors":"Timo Seitz, Ö Emre C Alagöz, Thorsten Meiser","doi":"10.1177/00131644251341843","DOIUrl":"10.1177/00131644251341843","url":null,"abstract":"<p><p>High-stakes personality assessments are often compromised by faking, where test-takers distort their responses according to social desirability. Many previous models have accounted for faking by modeling an additional latent dimension that quantifies each test-taker's degree of faking. Such models assume a homogeneous response strategy among all test-takers, reflected in a measurement model in which substantive traits and faking jointly influence item responses. However, such a model will be misspecified if, for some test-takers, item responding is only a function of substantive traits or only a function of faking. To address this limitation, we propose a mixture modeling extension of the multidimensional nominal response model (M-MNRM) that can be used to account for qualitatively different response strategies and to model relationships of strategy use with external variables. In a simulation study, the M-MNRM exhibited good parameter recovery and high classification accuracy across multiple conditions. Analyses of three empirical high-stakes datasets provided evidence for the consistent presence of the specified latent classes in different personnel selection contexts, emphasizing the importance of accounting for such kind of response behavior heterogeneity in high-stakes assessment data. We end the article with a discussion of the model's utility for psychological measurement.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251341843"},"PeriodicalIF":2.3,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12310618/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144774941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Item Difficulty Modeling Using Fine-tuned Small and Large Language Models. 道具难度建模使用微调的大小语言模型。
IF 2.1 3区 心理学
Educational and Psychological Measurement Pub Date : 2025-07-06 DOI: 10.1177/00131644251344973
Ming Li, Hong Jiao, Tianyi Zhou, Nan Zhang, Sydney Peters, Robert W Lissitz
{"title":"Item Difficulty Modeling Using Fine-tuned Small and Large Language Models.","authors":"Ming Li, Hong Jiao, Tianyi Zhou, Nan Zhang, Sydney Peters, Robert W Lissitz","doi":"10.1177/00131644251344973","DOIUrl":"10.1177/00131644251344973","url":null,"abstract":"<p><p>This study investigates methods for item difficulty modeling in large-scale assessments using both small and large language models (LLMs). We introduce novel data augmentation strategies, including augmentation on the fly and distribution balancing, that surpass benchmark performances, demonstrating their effectiveness in mitigating data imbalance and improving model performance. Our results showed that fine-tuned small language models (SLMs) such as Bidirectional Encoder Representations from Transformers (BERT) and RoBERTa yielded lower root mean squared error than the first-place model in the BEA 2024 Shared Task competition, whereas domain-specific models like BioClinicalBERT and PubMedBERT did not provide significant improvements due to distributional gaps. Majority voting among SLMs enhanced prediction accuracy, reinforcing the benefits of ensemble learning. LLMs, such as GPT-4, exhibited strong generalization capabilities but struggled with item difficulty prediction, likely due to limited training data and the absence of explicit difficulty-related context. Chain-of-thought prompting and rationale generation approaches were explored but did not yield substantial improvements, suggesting that additional training data or more sophisticated reasoning techniques may be necessary. Embedding-based methods, particularly using NV-Embed-v2, showed promise but did not outperform our best augmentation strategies, indicating that capturing nuanced difficulty-related features remains a challenge.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251344973"},"PeriodicalIF":2.1,"publicationDate":"2025-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12230038/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144590702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Historical Measurement Information Can Be Used to Improve Estimation of Structural Parameters in Structural Equation Models With Small Samples. 利用历史测量信息可以改善小样本结构方程模型中结构参数的估计。
IF 2.1 3区 心理学
Educational and Psychological Measurement Pub Date : 2025-06-13 DOI: 10.1177/00131644251330851
James Ohisei Uanhoro, Olushola O Soyoye
{"title":"Historical Measurement Information Can Be Used to Improve Estimation of Structural Parameters in Structural Equation Models With Small Samples.","authors":"James Ohisei Uanhoro, Olushola O Soyoye","doi":"10.1177/00131644251330851","DOIUrl":"10.1177/00131644251330851","url":null,"abstract":"<p><p>This study investigates the incorporation of historical measurement information into structural equation models (SEM) with small samples to enhance the estimation of structural parameters. Given the availability of published factor analysis results with loading estimates and standard errors for popular scales, researchers may use this historical information as informative priors in Bayesian SEM (BSEM). We focus on estimating the correlation between two constructs using BSEM after generating data with significant bias in the Pearson correlation of their sum scores due to measurement error. Our findings indicate that incorporating historical information on measurement parameters as priors can improve the accuracy of correlation estimates, mainly when the true correlation is small-a common scenario in psychological research. Priors derived from meta-analytic estimates were especially effective, providing high accuracy and acceptable coverage. However, when the true correlation is large, weakly informative priors on all parameters yield the best results. These results suggest leveraging historical measurement information in BSEM can enhance structural parameter estimation.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251330851"},"PeriodicalIF":2.1,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12170579/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144324766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating the Performance of a Regularized Differential Item Functioning Method for Testlet-Based Polytomous Items. 评估基于测试的多同构项目的正则化微分项目功能方法的性能。
IF 2.1 3区 心理学
Educational and Psychological Measurement Pub Date : 2025-05-31 DOI: 10.1177/00131644251342512
Jing Huang, M David Miller, Anne Corinne Huggins-Manley, Walter L Leite, Herman T Knopf, Albert D Ritzhaupt
{"title":"Evaluating the Performance of a Regularized Differential Item Functioning Method for Testlet-Based Polytomous Items.","authors":"Jing Huang, M David Miller, Anne Corinne Huggins-Manley, Walter L Leite, Herman T Knopf, Albert D Ritzhaupt","doi":"10.1177/00131644251342512","DOIUrl":"10.1177/00131644251342512","url":null,"abstract":"<p><p>This study investigated the effect of testlets on regularization-based differential item functioning (DIF) detection in polytomous items, focusing on the generalized partial credit model with lasso penalization (GPCMlasso) DIF method. Five factors were manipulated: sample size, magnitude of testlet effect, magnitude of DIF, number of DIF items, and type of DIF-inducing covariates. Model performance was evaluated using false-positive rate (FPR) and true-positive rate (TPR). Results showed that the simulation had effective control of FPR across conditions, while the TPR was differentially influenced by the manipulated factors. Generally, the small testlet effect did not noticeably affect the GPCMlasso model's performance regarding FPR and TPR. The findings provide evidence of the effectiveness of the GPCMlasso method for DIF detection in polytomous items when testlets were present. The implications for future research and limitations were also discussed.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251342512"},"PeriodicalIF":2.1,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12126468/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144207999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Beta-Binomial Model for Count Data: An Application in Estimating Model-Based Oral Reading Fluency. 计数数据的β -二项模型:在评估基于模型的口语阅读流畅性中的应用。
IF 2.1 3区 心理学
Educational and Psychological Measurement Pub Date : 2025-05-30 DOI: 10.1177/00131644251335914
Xin Qiao, Akihito Kamata, Yusuf Kara, Cornelis Potgieter, Joseph F T Nese
{"title":"Beta-Binomial Model for Count Data: An Application in Estimating Model-Based Oral Reading Fluency.","authors":"Xin Qiao, Akihito Kamata, Yusuf Kara, Cornelis Potgieter, Joseph F T Nese","doi":"10.1177/00131644251335914","DOIUrl":"10.1177/00131644251335914","url":null,"abstract":"<p><p>In this article, the beta-binomial model for count data is proposed and demonstrated in terms of its application in the context of oral reading fluency (ORF) assessment, where the number of words read correctly (WRC) is of interest. Existing studies adopted the binomial model for count data in similar assessment scenarios. The beta-binomial model, however, takes into account extra variability in count data that have been neglected by the binomial model. Therefore, it accommodates potential overdispersion in count data compared to the binomial model. To estimate model-based ORF scores, WRC and response times were jointly modeled. The full Bayesian Markov chain Monte Carlo method was adopted for model parameter estimation. A simulation study showed adequate parameter recovery of the beta-binomial model and evaluated the performance of model fit indices in selecting the true data-generating models. Further, an empirical analysis illustrated the application of the proposed model using a dataset from a computerized ORF assessment. The obtained findings were consistent with the simulation study and demonstrated the utility of adopting the beta-binomial model for count-type item responses from assessment data.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251335914"},"PeriodicalIF":2.1,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12125017/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144198554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Thurstonian IRT Modeling: Logical Dependencies as an Accurate Reflection of Thurstone's Law of Comparative Judgment. 贝叶斯瑟斯顿IRT模型:逻辑依赖是瑟斯顿比较判断定律的准确反映。
IF 2.1 3区 心理学
Educational and Psychological Measurement Pub Date : 2025-05-30 DOI: 10.1177/00131644251335586
Hannah Heister, Philipp Doebler, Susanne Frick
{"title":"Bayesian Thurstonian IRT Modeling: Logical Dependencies as an Accurate Reflection of Thurstone's Law of Comparative Judgment.","authors":"Hannah Heister, Philipp Doebler, Susanne Frick","doi":"10.1177/00131644251335586","DOIUrl":"10.1177/00131644251335586","url":null,"abstract":"<p><p>Thurstonian item response theory (Thurstonian IRT) is a well-established approach to latent trait estimation with forced choice data of arbitrary block lengths. In the forced choice format, test takers rank statements within each block. This rank is coded with binary variables. Since each rank is awarded exactly once per block, stochastic dependencies arise, for example, when options A and B have ranks 1 and 3, C must have rank 2 in a block of length 3. Although the original implementation of the Thurstonian IRT model can recover parameters well, it is not completely true to the mathematical model and Thurstone's law of comparative judgment, as impossible binary answer patterns have a positive probability. We refer to this problem as stochastic dependencies and it is due to unconstrained item intercepts. In addition, there are redundant binary comparisons resulting in what we call logical dependencies, for example, if within a block <math><mrow><mi>A</mi> <mo><</mo> <mi>B</mi></mrow> </math> and <math><mrow><mi>B</mi> <mo><</mo> <mi>C</mi></mrow> </math> , then <math><mrow><mi>A</mi> <mo><</mo> <mi>C</mi></mrow> </math> must follow and a binary variable for <math><mrow><mi>A</mi> <mo><</mo> <mi>C</mi></mrow> </math> is not needed. Since current Markov Chain Monte Carlo approaches to Bayesian computation are flexible and at the same time promise correct small sample inference, we investigate an alternative Bayesian implementation of the Thurstonian IRT model considering both stochastic and logical dependencies. We show analytically that the same parameters maximize the posterior likelihood, regardless of the presence or absence of redundant binary comparisons. A comparative simulation reveals a large reduction in computational effort for the alternative implementation, which is due to respecting both dependencies. Therefore, this investigation suggests that when fitting the Thurstonian IRT model, all dependencies should be considered.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251335586"},"PeriodicalIF":2.1,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12125010/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144198553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Biclustering to Detect Cheating in Real Time on Mixed-Format Tests. 用双聚类实时检测混合格式考试作弊。
IF 2.1 3区 心理学
Educational and Psychological Measurement Pub Date : 2025-05-24 DOI: 10.1177/00131644251333143
Hyeryung Lee, Walter P Vispoel
{"title":"Using Biclustering to Detect Cheating in Real Time on Mixed-Format Tests.","authors":"Hyeryung Lee, Walter P Vispoel","doi":"10.1177/00131644251333143","DOIUrl":"10.1177/00131644251333143","url":null,"abstract":"<p><p>We evaluated a real-time biclustering method for detecting cheating on mixed-format assessments that included dichotomous, polytomous, and multi-part items. Biclustering jointly groups examinees and items by identifying subgroups of test takers who exhibit similar response patterns on specific subsets of items. This method's flexibility and minimal assumptions about examinee behavior make it computationally efficient and highly adaptable. To further finetune accuracy and reduce false positives in real-time detection, enhanced statistical significance tests were incorporated into the illustrated algorithms. Two simulation studies were conducted to assess detection across varying testing conditions. In the first study, the method effectively detected cheating on tests composed entirely of either dichotomous or non-dichotomous items. In the second study, we examined tests with varying mixed item formats and again observed strong detection performance. In both studies, detection performance was examined at each timestamp in real time and evaluated under three varying conditions: proportion of cheaters, cheating group size, and proportion of compromised items. Across conditions, the method demonstrated strong computational efficiency, underscoring its suitability for real-time applications. Overall, these results highlight the adaptability, versatility, and effectiveness of biclustering in detecting cheating in real time while maintaining low false-positive rates.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251333143"},"PeriodicalIF":2.1,"publicationDate":"2025-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12104213/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144156794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Deep Reinforcement Learning to Decide Test Length. 使用深度强化学习来决定测试长度。
IF 2.1 3区 心理学
Educational and Psychological Measurement Pub Date : 2025-05-03 DOI: 10.1177/00131644251332972
James Zoucha, Igor Himelfarb, Nai-En Tang
{"title":"Using Deep Reinforcement Learning to Decide Test Length.","authors":"James Zoucha, Igor Himelfarb, Nai-En Tang","doi":"10.1177/00131644251332972","DOIUrl":"https://doi.org/10.1177/00131644251332972","url":null,"abstract":"<p><p>This study explored the application of deep reinforcement learning (DRL) as an innovative approach to optimize test length. The primary focus was to evaluate whether the current length of the National Board of Chiropractic Examiners Part I Exam is justified. By modeling the problem as a combinatorial optimization task within a Markov Decision Process framework, an algorithm capable of constructing test forms from a finite set of items while adhering to critical structural constraints, such as content representation and item difficulty distribution, was used. The findings reveal that although the DRL algorithm was successful in identifying shorter test forms that maintained comparable ability estimation accuracy, the existing test length of 240 items remains advisable as we found shorter test forms did not maintain structural constraints. Furthermore, the study highlighted the inherent adaptability of DRL to continuously learn about a test-taker's latent abilities and dynamically adjust to their response patterns, making it well-suited for personalized testing environments. This dynamic capability supports real-time decision-making in item selection, improving both efficiency and precision in ability estimation. Future research is encouraged to focus on expanding the item bank and leveraging advanced computational resources to enhance the algorithm's search capacity for shorter, structurally compliant test forms.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251332972"},"PeriodicalIF":2.1,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12049363/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143988676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信