{"title":"Using Bayesian Networks for Cognitive Assessment of Student Understanding of Buoyancy: A Granular Hierarchy Model","authors":"L. Wang, Sun Xiao Jian, Yan Lou Liu, Tao Xin","doi":"10.1080/08957347.2023.2172014","DOIUrl":"https://doi.org/10.1080/08957347.2023.2172014","url":null,"abstract":"ABSTRACT Cognitive diagnostic assessment based on Bayesian networks (BN) is developed in this paper to evaluate student understanding of the physical concept of buoyancy. we propose a three-order granular-hierarchy BN model which accounts for both fine-grained attributes and high-level proficiencies. Conditional independence in the BN structure is tested and utilized to validate the proposed model. The proficiency relationships are verified and the initial Q-matrix is refined. Then, an optimized granular hierarchy model is constructed based on the updated Q-matrix. All variants of the constructed models are evaluated on the basis of the prediction accuracy and the goodness-of-fit test. The experimental results demonstrate that the optimized granular-hierarchy model has the best prediction and model-fitting performance. In general, the BN method not only can provide more flexible modeling approach, but also can help validate or refine the proficiency model and the Q-matrix and this method has its unique advantage in cognitive diagnosis.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49350798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Are Large Admissions Test Coaching Effects Widespread? A Longitudinal Analysis of Admissions Test Scores","authors":"Jeffrey A. Dahlke, P. Sackett, N. Kuncel","doi":"10.1080/08957347.2023.2172018","DOIUrl":"https://doi.org/10.1080/08957347.2023.2172018","url":null,"abstract":"ABSTRACT We examine longitudinal data from 120,384 students who took a version of the PSAT/SAT in the 9th, 10th, 11th, and 12th grades. We investigate score changes over time and show that socioeconomic status (SES) is related to the degree of score improvement. We note that the 9th and 10th grade PSAT are low-stakes tests, while the operational SAT is a high-stakes test. We posit that investments in coaching would be uncommon for early PSAT administrations, and would be concentrated on efforts to prepare for the operational SAT. We compare score improvements between 9th and 10th grade with improvements between 10th and 12th grade, examining results separately by level of SES. We find similar levels of score improvement in low-stakes and high-stakes settings, with 3.4% of high-SES and 1.1% of low-SES students showing larger-than-expected score improvements, which is inconsistent with claims that high-SES students have routine access to highly effective coaching.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2023-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42421288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rashid M Abu-Ghazalah, David N Dubins, Gregory M K Poon
{"title":"Dissecting knowledge, guessing, and blunder in multiple choice assessments.","authors":"Rashid M Abu-Ghazalah, David N Dubins, Gregory M K Poon","doi":"10.1080/08957347.2023.2172017","DOIUrl":"10.1080/08957347.2023.2172017","url":null,"abstract":"<p><p>Multiple choice results are inherently probabilistic outcomes, as correct responses reflect a combination of knowledge and guessing, while incorrect responses additionally reflect blunder, a confidently committed mistake. To objectively resolve knowledge from responses in an MC test structure, we evaluated probabilistic models that explicitly account for guessing, knowledge and blunder using eight assessments (>9,000 responses) from an undergraduate biotechnology curriculum. A Bayesian implementation of the models, aimed at assessing their robustness to prior beliefs in examinee knowledge, showed that explicit estimators of knowledge are markedly sensitive to prior beliefs with scores as sole input. To overcome this limitation, we examined self-ranked confidence as a proxy knowledge indicator. For our test set, three levels of confidence resolved test performance. Responses rated as least confident were correct more frequently than expected from random selection, reflecting partial knowledge, but were balanced by blunder among the most confident responses. By translating evidence-based guessing and blunder rates to pass marks that statistically qualify a desired level of examinee knowledge, our approach finds practical utility in test analysis and design.</p>","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10201919/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9522330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Personality Aspects and the Underprediction of Women’s Academic Performance","authors":"You Zhou, P. Sackett, Thomas Brothen","doi":"10.1080/08957347.2022.2155652","DOIUrl":"https://doi.org/10.1080/08957347.2022.2155652","url":null,"abstract":"ABSTRACT We sought to replicate prior findings that admissions tests’ underprediction of female college performance was driven in part by the omission of Big 5 personality factors from the predictive model, using 5,400 college students. We investigated gender differences in an elaborated model subdividing the Big 5 into ten aspects. We found differences at the aspect level that were not found at the factor level, and some aspects had unique relationships with academic outcomes. The findings demonstrated the effect of omitted variables on predictive bias.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46522162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Examination of Individual Ability Estimation and Classification Accuracy Under Rapid Guessing Misidentifications","authors":"Joseph A. Rios","doi":"10.1080/08957347.2022.2155653","DOIUrl":"https://doi.org/10.1080/08957347.2022.2155653","url":null,"abstract":"ABSTRACT To mitigate the deleterious effects of rapid guessing (RG) on ability estimates, several rescoring procedures have been proposed. Underlying many of these procedures is the assumption that RG is accurately identified. At present, there have been minimal investigations examining the utility of rescoring approaches when RG is misclassified, and individual scores are reported. To address this limitation, the present simulation study investigates the effect of RG misclassifications on individual examinee ability estimate bias and classification accuracy when using effort-moderated (EM) scoring. This objective is accomplished by manipulating simulee ability level, RG rate, as well as misclassification type and percentage. Results showed that EM scoring significantly improved ability inferences for examinees engaging in RG; however, the effectiveness of this approach was largely dependent on misclassification type. Specifically, across ability levels, bias tended to be on average lower when falsely classifying effortful responses as RG. Although EM scoring improved bias, it was susceptible to elevated false-positive classifications of ability under high RG.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42107151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison of Methods for Identifying Differential Step Functioning with Polytomous Item Response Data","authors":"Holmes W. Finch","doi":"10.1080/08957347.2022.2155650","DOIUrl":"https://doi.org/10.1080/08957347.2022.2155650","url":null,"abstract":"ABSTRACT Much research has been devoted to identification of differential item functioning (DIF), which occurs when the item responses for individuals from two groups differ after they are conditioned on the latent trait being measured by the scale. There has been less work examining differential step functioning (DSF), which is present for polytomous items when the conditional likelihood of responses to specific categories differ between groups. DSF impacts estimation of the measured trait and reduces the effectiveness of standard DIF detection methods. The purpose of this simulation study was to extend upon earlier work by comparing several methods for detecting the presence of DSF in polytomous items, including an approach based on the lasso estimation of the generalized partial credit model. Results show that the lasso GPCM technique controlled the Type I error rate while yielding power rates somewhat lower than logistic regression and the MIMIC model, which were not able to control the Type I error rate in some conditions. An empirical example is also presented, and implications of this study for practice are discussed.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47299711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance Decline as an Indicator of Generalized Test-Taking Disengagement","authors":"S. Wise, G. Kingsbury","doi":"10.1080/08957347.2022.2155651","DOIUrl":"https://doi.org/10.1080/08957347.2022.2155651","url":null,"abstract":"ABSTRACT In achievement testing we assume that students will demonstrate their maximum performance as they encounter test items. Sometimes, however, student performance can decline during a test event, which implies that the test score does not represent maximum performance. This study describes a method for identifying significant performance decline and investigated its utility as an indicator of generalized test-taking disengagement. Analysis of data from a computerized adaptive interim achievement test showed that performance decline classifications exhibited characteristics similar to those from disengagement classifications based on rapid guessing. More importantly, performance decline was found to identify disengagement by many students who would not have been identified as disengaged based on rapid-guessing behavior.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42114164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"When Should Individual Ability Estimates Be Reported if Rapid Guessing Is Present?","authors":"Joseph A. Rios","doi":"10.1080/08957347.2022.2103138","DOIUrl":"https://doi.org/10.1080/08957347.2022.2103138","url":null,"abstract":"<p><b>ABSTRACT</b></p><p>Testing programs are confronted with the decision of whether to report individual scores for examinees that have engaged in rapid guessing (RG). As noted by the <i>Standards for Educational and Psychological Testing</i>, this decision should be based on a documented criterion that determines score exclusion. To this end, a number of heuristic criteria (e.g., exclude all examinees with RG rates of 10%) have been adopted in the literature. Given that these criteria lack strong methodological support, the objective of this simulation study was to evaluate their appropriateness in terms of individual ability estimate and classification accuracy when manipulating both assessment and RG characteristics. The findings provide evidence that employing a common criterion for all examinees may be an ineffective strategy because a given RG percentage may have differing degrees of biasing effects based on test difficulty, examinee ability, and RG pattern. These results suggest that practitioners may benefit from establishing context-specific exclusion criteria that consider test purpose, score use, and targeted examinee trait levels.</p>","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138495008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Not-reached Items: An Issue of Time and of test-taking Disengagement? the Case of PISA 2015 Reading Data","authors":"Elodie Pools","doi":"10.1080/08957347.2022.2103136","DOIUrl":"https://doi.org/10.1080/08957347.2022.2103136","url":null,"abstract":"ABSTRACT Many low-stakes assessments, such as international large-scale surveys, are administered during time-limited testing sessions and some test-takers are not able to endorse the last items of the test, resulting in not-reached (NR) items. However, because the test has no consequence for the respondents, these NR items can also stem from quitting the test. This article, by means of mixture modeling, investigates heterogeneity in the onset of NR items in reading in PISA 2015. Test-taking behavior, assessed by the response times on the first items of the test, and the risk of NR item onset are modeled simultaneously in a 3-class model that distinguishes rapid, slow and typical respondents. Results suggest that NR items can come from a lack of time or from disengaged behaviors and that the relationship between the number of NR items and ability estimate can be affected by these non-effortful NR responses.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2022-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45573999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Response Demands of Reading Comprehension Test Items: A Review of Item Difficulty Modeling Studies","authors":"Steve Ferrara, J. Steedle, R. Frantz","doi":"10.1080/08957347.2022.2103135","DOIUrl":"https://doi.org/10.1080/08957347.2022.2103135","url":null,"abstract":"ABSTRACT Item difficulty modeling studies involve (a) hypothesizing item features, or item response demands, that are likely to predict item difficulty with some degree of accuracy; and (b) entering the features as independent variables into a regression equation or other statistical model to predict difficulty. In this review, we report findings from 13 empirical item difficulty modeling studies of reading comprehension tests. We define reading comprehension item response demands as reading passage variables (e.g., length, complexity), passage-by-item variables (e.g., degree of correspondence between item and text, type of information requested), and item stem and response option variables. We report on response demand variables that are related to item difficulty and illustrate how they can be used to manage item difficulty in construct-relevant ways so that empirical item difficulties are within a targeted range (e.g., located within the Proficient or other proficiency level range on a test’s IRT scale, where intended).","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":null,"pages":null},"PeriodicalIF":1.5,"publicationDate":"2022-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49021008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}