M. Davison, David J. Weiss, Joseph N. DeWeese, Ozge Ersan, Gina Biancarosa, Patrick C. Kennedy
{"title":"A Diagnostic Tree Model for Adaptive Assessment of Complex Cognitive Processes Using Multidimensional Response Options","authors":"M. Davison, David J. Weiss, Joseph N. DeWeese, Ozge Ersan, Gina Biancarosa, Patrick C. Kennedy","doi":"10.3102/10769986231158301","DOIUrl":"https://doi.org/10.3102/10769986231158301","url":null,"abstract":"A tree model for diagnostic educational testing is described along with Monte Carlo simulations designed to evaluate measurement accuracy based on the model. The model is implemented in an assessment of inferential reading comprehension, the Multiple-Choice Online Causal Comprehension Assessment (MOCCA), through a sequential, multidimensional, computerized adaptive testing (CAT) strategy. Assessment of the first dimension, reading comprehension (RC), is based on the three-parameter logistic model. For diagnostic and intervention purposes, the second dimension, called process propensity (PP), is used to classify struggling students based on their pattern of incorrect responses. In the simulation studies, CAT item selection rules and stopping rules were varied to evaluate their effect on measurement accuracy along dimension RC and classification accuracy along dimension PP. For dimension RC, methods that improved accuracy tended to increase test length. For dimension PP, however, item selection and stopping rules increased classification accuracy without materially increasing test length. A small live-testing pilot study confirmed some of the findings of the simulation studies. Development of the assessment has been guided by psychometric theory, Monte Carlo simulation results, and a theory of instruction and diagnosis.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44970278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Restricted DINA Model: A Comprehensive Cognitive Diagnostic Model for Classroom-Level Assessments","authors":"P. Nájera, F. J. Abad, Chia-Yi Chiu, M. Sorrel","doi":"10.3102/10769986231158829","DOIUrl":"https://doi.org/10.3102/10769986231158829","url":null,"abstract":"The nonparametric classification (NPC) method has been proven to be a suitable procedure for cognitive diagnostic assessments at a classroom level. However, its nonparametric nature impedes the obtention of a model likelihood, hindering the exploration of crucial psychometric aspects, such as model fit or reliability. Reporting the reliability and validity of scores is imperative in any applied context. The present study proposes the restricted deterministic input, noisy “and” gate (R-DINA) model, a parametric cognitive diagnosis model based on the NPC method that provides the same attribute profile classifications as the nonparametric method while allowing to derive a model likelihood and, subsequently, to compute fit and reliability indices. The suitability of the new proposal is examined by means of an exhaustive simulation study and a real data illustration. The results show that the R-DINA model properly recovers the posterior probabilities of attribute mastery, thus becoming a suitable alternative for comprehensive small-scale diagnostic assessments.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43078375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finding the Right Grain-Size for Measurement in the Classroom","authors":"M. Wilson","doi":"10.3102/10769986231159006","DOIUrl":"https://doi.org/10.3102/10769986231159006","url":null,"abstract":"This article introduces a new framework for articulating how educational assessments can be related to teacher uses in the classroom. It articulates three levels of assessment: macro (use of standardized tests), meso (externally developed items), and micro (on-the-fly in the classroom). The first level is the usual context for educational measurement, but one of the contributions of this article is that it mainly focuses on the latter two levels. Co-ordination of the content across these two levels can be achieved using the concept of a construct map, which articulates the substantive target property at levels of detail that are appropriate for both teacher planning and within-classroom use. This article then describes a statistical model designed to span these two levels and discusses how best to relate this to the macrolevel. Results from a curriculum and instruction development project on the topic of measurement in the elementary school are demonstrated, showing how they are empirically related.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45422470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing Diagnostic Classification Models Application Considering Real-Life Constraints","authors":"Kun Su, R. Henson","doi":"10.3102/10769986231159137","DOIUrl":"https://doi.org/10.3102/10769986231159137","url":null,"abstract":"This article provides a process to carefully evaluate the suitability of a content domain for which diagnostic classification models (DCMs) could be applicable and then optimized steps for constructing a test blueprint for applying DCMs and a real-life example illustrating this process. The content domains were carefully evaluated using a set of defined criteria, which are purposely defined to improve the success rate of DCM implementation. Given the domain, the Q-matrix is determined by a simulation-based approach using correct classification rates as criteria. Finally, a physics test on the final Q-matrix was developed, administered, and analyzed by the author and the subject-matter experts (SMEs).","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44298525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Expertise on Offer: Why Isn’t Anyone Buying?","authors":"H. Braun","doi":"10.3102/10769986231160671","DOIUrl":"https://doi.org/10.3102/10769986231160671","url":null,"abstract":"It is a much-lamented fact that research with the potential to inform or influence education policy instead remains policy inert. There are many reasons for this frustrating state of affairs, including a lack of strategic thinking on the part of researchers on how to successfully accomplish outreach—as opposed to communication with peers (in-reach). Another, and a principal focus of this article, is the failure of researchers to appreciate the power of employing compelling narratives to bring their findings to the attention of policymakers and other stakeholders. Accordingly, this article presents some examples of narratives specifically designed for outreach and discusses some of their features. It also considers the challenges in gaining traction with counternarratives once a particular narrative has achieved currency. Researchers should also be mindful of the tenor of the times, with experts now often viewed with skepticism, if not downright hostility. In some quarters, excessive reliance on technocrats is even seen as a threat to democratic governance. The article concludes with some recommendations on how to appropriately enhance the role of research in education policymaking.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"48 1","pages":"547 - 572"},"PeriodicalIF":2.4,"publicationDate":"2023-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41996110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detecting Item Preknowledge Using Revisits With Speed and Accuracy","authors":"Onur Demirkaya, Ummugul Bezirhan, Jinming Zhang","doi":"10.3102/10769986231153403","DOIUrl":"https://doi.org/10.3102/10769986231153403","url":null,"abstract":"Examinees with item preknowledge tend to obtain inflated test scores that undermine test score validity. With the availability of process data collected in computer-based assessments, the research on detecting item preknowledge has progressed on using both item scores and response times. Item revisit patterns of examinees can also be utilized as an additional source of information. This study proposes a new statistic for detecting item preknowledge when compromised items are known by utilizing the hierarchical speed–accuracy revisits model. By simultaneously evaluating abnormal changes in the latent abilities, speeds, and revisit propensities of examinees, the procedure was found to provide greater statistical power and stronger substantive evidence that an examinee had indeed benefited from item preknowledge.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"48 1","pages":"521 - 542"},"PeriodicalIF":2.4,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48567321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Causal Latent Transition Model With Multivariate Outcomes and Unobserved Heterogeneity: Application to Human Capital Development","authors":"F. Bartolucci, F. Pennoni, G. Vittadini","doi":"10.3102/10769986221150033","DOIUrl":"https://doi.org/10.3102/10769986221150033","url":null,"abstract":"In order to evaluate the effect of a policy or treatment with pre- and post-treatment outcomes, we propose an approach based on a transition model, which may be applied with multivariate outcomes and accounts for unobserved heterogeneity. This model is based on potential versions of discrete latent variables representing the individual characteristic of interest and may be cast in the hidden (latent) Markov literature for panel data. Therefore, it can be estimated by maximum likelihood in a relatively simple way. The approach extends the difference-in-difference method as it is possible to deal with multivariate outcomes. Moreover, causal effects may be expressed with respect to transition probabilities. The proposal is validated through a simulation study, and it is applied to evaluate educational programs administered to pupils in the sixth and seventh grades during their middle school period. These programs are carried out in an Italian region to improve non-cognitive skills (CSs). We study if they impact also on students’ CSs in Italian and Mathematics in the eighth grade, exploiting the pretreatment test scores available in the fifth grade. The main conclusion is that the educational programs aimed to develop noncognitive abilities help the best students to maintain their higher cognitive abilities over time.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"48 1","pages":"387 - 419"},"PeriodicalIF":2.4,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48079573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Handling Missing Data in Growth Mixture Models","authors":"D. Y. Lee, Jeffrey R. Harring","doi":"10.3102/10769986221149140","DOIUrl":"https://doi.org/10.3102/10769986221149140","url":null,"abstract":"A Monte Carlo simulation was performed to compare methods for handling missing data in growth mixture models. The methods considered in the current study were (a) a fully Bayesian approach using a Gibbs sampler, (b) full information maximum likelihood using the expectation–maximization algorithm, (c) multiple imputation, (d) a two-stage multiple imputation method, and (e) listwise deletion. Of the five methods, it was found that the Bayesian approach and two-stage multiple imputation methods generally produce less biased parameter estimates compared to maximum likelihood or single imputation methods, although key differences were observed. Similarities and disparities among methods are highlighted and general recommendations articulated.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"48 1","pages":"320 - 348"},"PeriodicalIF":2.4,"publicationDate":"2023-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46904680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clinical (In)Efficiency in the Prediction of Dangerous Behavior","authors":"Ehsan Bokhari","doi":"10.3102/10769986221144727","DOIUrl":"https://doi.org/10.3102/10769986221144727","url":null,"abstract":"The prediction of dangerous and/or violent behavior is particularly important to the conduct of the U.S. criminal justice system when it makes decisions about restrictions of personal freedom, such as preventive detention, forensic commitment, parole, and in some states such as Texas, when to permit an execution to proceed of an individual found guilty of a capital crime. This article discusses the prediction of dangerous behavior both through clinical judgment and actuarial assessment. The general conclusion drawn is that for both clinical and actuarial prediction of dangerous behavior, we are far from a level of accuracy that could justify routine use. To support this later negative assessment, two topic areas are emphasized: (1) the MacArthur Study of Mental Disorder and Violence, including the actuarial instrument developed as part of this project (the Classification of Violence Risk), along with all the data collected that helped develop the instrument; and (2) the U.S. Supreme Court case of Barefoot v. Estelle (1983) and the American Psychiatric Association “friend of the court” brief on the (in)accuracy of clinical prediction for the commission of future violence. Although now three decades old, Barefoot v. Estelle is still the controlling Supreme Court opinion regarding the prediction of future dangerous behavior and the imposition of the death penalty in states, such as Texas; for example, see Coble v. Texas (2011) and the Supreme Court denial of certiorari in that case.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"48 1","pages":"661 - 682"},"PeriodicalIF":2.4,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47231762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Randomization P-Value Test for Detecting Copying on Multiple-Choice Exams","authors":"J. Lang","doi":"10.3102/10769986221143515","DOIUrl":"https://doi.org/10.3102/10769986221143515","url":null,"abstract":"This article is concerned with the statistical detection of copying on multiple-choice exams. As an alternative to existing permutation- and model-based copy-detection approaches, a simple randomization p-value (RP) test is proposed. The RP test, which is based on an intuitive match-score statistic, makes no assumptions about the distribution of examinees’ answer vectors and hence is broadly applicable. Especially important in this copy-detection setting, the RP test is shown to be exact in that its size is guaranteed to be no larger than a nominal α value. Additionally, simulation results suggest that the RP test is typically more powerful for copy detection than the existing approximate tests. The development of the RP test is based on the idea that the copy-detection problem can be recast as a causal inference and missing data problem. In particular, the observed data are viewed as a subset of a larger collection of potential values, or counterfactuals, and the null hypothesis of “no copying” is viewed as a “no causal effect” hypothesis and formally expressed in terms of constraints on potential variables.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"48 1","pages":"296 - 319"},"PeriodicalIF":2.4,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49603850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}