{"title":"A Factor Mixture Model for Item Responses and Certainty of Response Indices to Identify Student Knowledge Profiles","authors":"Chia-Wen Chen, Björn Andersson, Jinxin Zhu","doi":"10.1111/jedm.12344","DOIUrl":"10.1111/jedm.12344","url":null,"abstract":"<p>The certainty of response index (CRI) measures respondents' confidence level when answering an item. In conjunction with the answers to the items, previous studies have used descriptive statistics and arbitrary thresholds to identify student knowledge profiles with the CRIs. Whereas this approach overlooked the measurement error of the observed item responses and indices, we address this by proposing a factor mixture model that integrates a latent class model to detect student subgroups and a measurement model to control for student ability and confidence level. Applying the model to 773 seventh graders' responses to an algebra test, where some items were related to new material that had not been taught in class, we found two subgroups: (1) students who had high confidence in answering items involving the new material; and (2) students who had low confidence in answering items involving the new material but higher general self-confidence than the first group. We regressed the posterior probability of the group membership on gender, prior achievement, and preview behavior and found preview behavior a significant factor associated with the membership. Finally, we discussed the implications of the current study for teaching practices and future research.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 1","pages":"28-51"},"PeriodicalIF":1.3,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12344","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43460732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Betty Lanteigne, Christine Coombe, & James Dean Brown. 2021. Challenges in Language Testing around the World: Insights for language test users. Singapore: Springer, 2021, 129.99 € (hardcover), ISBN 978-981-33-4232-3 (eBook). xxiii + 553 pp. https://doi.org/10.1007/978-981-33-4232-3","authors":"Bahram Kazemian, Shafigeh Mohammadian","doi":"10.1111/jedm.12343","DOIUrl":"10.1111/jedm.12343","url":null,"abstract":"","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 4","pages":"536-544"},"PeriodicalIF":1.3,"publicationDate":"2022-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45401317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Item Scores and Distractors in Person-Fit Assessment","authors":"Kylie Gorney, James A. Wollack","doi":"10.1111/jedm.12345","DOIUrl":"10.1111/jedm.12345","url":null,"abstract":"<p>In order to detect a wide range of aberrant behaviors, it can be useful to incorporate information beyond the dichotomous item scores. In this paper, we extend the <math>\u0000 <semantics>\u0000 <msub>\u0000 <mi>l</mi>\u0000 <mi>z</mi>\u0000 </msub>\u0000 <annotation>$l_z$</annotation>\u0000 </semantics></math> and <math>\u0000 <semantics>\u0000 <msubsup>\u0000 <mi>l</mi>\u0000 <mi>z</mi>\u0000 <mo>∗</mo>\u0000 </msubsup>\u0000 <annotation>$l_z^*$</annotation>\u0000 </semantics></math> person-fit statistics so that unusual behavior in item scores and unusual behavior in item distractors can be used as indicators of aberrance. Through detailed simulations, we show that the new statistics are more powerful than existing statistics in detecting several types of aberrant behavior, and that they are able to control the Type I error rate in instances where the model does not exactly fit the data. A real data example is also provided to demonstrate the utility of the new statistics in an operational setting.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 1","pages":"3-27"},"PeriodicalIF":1.3,"publicationDate":"2022-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12345","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48816866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Bayesian Person-Fit Analysis Method Using Pivotal Discrepancy Measures","authors":"Adam Combs","doi":"10.1111/jedm.12342","DOIUrl":"10.1111/jedm.12342","url":null,"abstract":"<p>A common method of checking person-fit in Bayesian item response theory (IRT) is the posterior-predictive (PP) method. In recent years, more powerful approaches have been proposed that are based on resampling methods using the popular <math>\u0000 <semantics>\u0000 <msubsup>\u0000 <mi>L</mi>\u0000 <mi>z</mi>\u0000 <mo>∗</mo>\u0000 </msubsup>\u0000 <annotation>$L_{z}^{*}$</annotation>\u0000 </semantics></math> statistic. There has also been proposed a new Bayesian model checking method based on pivotal discrepancy measures (PDMs). A PDM <i>T</i> is a discrepancy measure that is a pivotal quantity with a known reference distribution. A posterior sample of <i>T</i> can be generated using standard Markov chain Monte Carlo output, and a <i>p</i>-value is obtained from probability bounds computed on order statistics of the sample. In this paper, we propose a general procedure to apply this PDM method to person-fit checking in IRT models. We illustrate this using the <math>\u0000 <semantics>\u0000 <msub>\u0000 <mi>L</mi>\u0000 <mi>z</mi>\u0000 </msub>\u0000 <annotation>$L_{z}$</annotation>\u0000 </semantics></math> and <math>\u0000 <semantics>\u0000 <msubsup>\u0000 <mi>L</mi>\u0000 <mi>z</mi>\u0000 <mo>∗</mo>\u0000 </msubsup>\u0000 <annotation>$L_{z}^{*}$</annotation>\u0000 </semantics></math> measures. Simulation studies are done comparing these with the PP method and one of the more recent resampling methods. The results show that the PDM method is more powerful than the PP method. Under certain conditions, it is more powerful than the resampling method, while in others, it is less. The PDM method is also applied to a real data set.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 1","pages":"52-75"},"PeriodicalIF":1.3,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46358680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Several Variations of Simple-Structure MIRT Equating","authors":"Stella Y. Kim, Won-Chan Lee","doi":"10.1111/jedm.12341","DOIUrl":"10.1111/jedm.12341","url":null,"abstract":"<p>The current study proposed several variants of simple-structure multidimensional item response theory equating procedures. Four distinct sets of data were used to demonstrate feasibility of proposed equating methods for two different equating designs: a random groups design and a common-item nonequivalent groups design. Findings indicated some notable differences between the multidimensional and unidimensional approaches when data exhibited evidence for multidimensionality. In addition, some of the proposed methods were successful in providing equating results for both section-level and composite-level scores, which has not been achieved by most of the existing methodologies. The traditional method of using a set of quadrature points and weights for equating turned out to be computationally intensive, particularly for the data with higher dimensions. The study suggested an alternative way of using the Monte-Carlo approach for such data. This study also proposed a simple-structure true-score equating procedure that does not rely on a multivariate <i>observed</i>-score distribution.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 1","pages":"76-105"},"PeriodicalIF":1.3,"publicationDate":"2022-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12341","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49051834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Validity Arguments Meet Artificial Intelligence in Innovative Educational Assessment","authors":"David W. Dorsey, Hillary R. Michaels","doi":"10.1111/jedm.12331","DOIUrl":"https://doi.org/10.1111/jedm.12331","url":null,"abstract":"<p>We have dramatically advanced our ability to create rich, complex, and effective assessments across a range of uses through technology advancement. Artificial Intelligence (AI) enabled assessments represent one such area of advancement—one that has captured our collective interest and imagination. Scientists and practitioners within the domains of organizational and workforce assessment have increasingly used AI in assessment, and its use is now becoming more common in education. While these types of solutions offer their users the promise of efficiency, effectiveness, and a “wow factor,” users need to maintain high standards for validity and fairness in high stakes settings. Due to the complexity of some AI methods and tools, this requirement for adherence to standards may challenge our traditional approaches to building validity and fairness arguments. In this edition, we review what these challenges may look like as validity arguments meet AI in educational assessment domains. We specifically explore how AI impacts Evidence-Centered Design (ECD) and development from assessment concept and coding to scoring and reporting. We also present information on ways to ensure that bias is not built into these systems. Lastly, we discuss future horizons, many that are almost here, for maximizing what AI offers while minimizing negative effects on test takers and programs.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 3","pages":"267-271"},"PeriodicalIF":1.3,"publicationDate":"2022-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"137805809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Deterministic Gated Lognormal Response Time Model to Identify Examinees with Item Preknowledge","authors":"Murat Kasli, Cengiz Zopluoglu, Sarah L. Toton","doi":"10.1111/jedm.12340","DOIUrl":"https://doi.org/10.1111/jedm.12340","url":null,"abstract":"<p>Response times (RTs) have recently attracted a significant amount of attention in the literature as they may provide meaningful information about item preknowledge. In this study, a new model, the Deterministic Gated Lognormal Response Time (DG-LNRT) model, is proposed to identify examinees with item preknowledge using RTs. The proposed model was applied to two different data sets and performance was assessed with false-positive rates, true-positive rates, and precision. The results were compared with another recently proposed Z-statistic. Follow-up simulation studies were also conducted to examine model performance in settings similar to the real data sets. The results indicate that the proposed model is viable and can help detect item preknowledge under certain conditions. However, its performance is highly dependent on the correct specification of the compromised items.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 1","pages":"148-169"},"PeriodicalIF":1.3,"publicationDate":"2022-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50123901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cognitive Diagnostic Multistage Testing by Partitioning Hierarchically Structured Attributes","authors":"Rae Yeong Kim, Yun Joo Yoo","doi":"10.1111/jedm.12339","DOIUrl":"10.1111/jedm.12339","url":null,"abstract":"<p>In cognitive diagnostic models (CDMs), a set of fine-grained attributes is required to characterize complex problem solving and provide detailed diagnostic information about an examinee. However, it is challenging to ensure reliable estimation and control computational complexity when The test aims to identify the examinee's attribute profile in a large-scale map of attributes. To address this problem, this study proposes a cognitive diagnostic multistage testing by partitioning hierarchically structured attributes (CD-MST-PH) as a multistage testing for CDM. In CD-MST-PH, multiple testlets can be constructed based on separate attribute groups before testing occurs, which retains the advantages of multistage testing over fully adaptive testing or the on-the-fly approach. Moreover, testlets are offered sequentially and adaptively, thus improving test accuracy and efficiency. An item information measure is proposed to compute the discrimination power of an item for each attribute, and a module assembly method is presented to construct modules anchored at each separate attribute group. Several module selection indices for CD-MST-PH are also proposed by modifying the item selection indices used in cognitive diagnostic computerized adaptive testing. The results of simulation study show that CD-MST-PH can improve test accuracy and efficiency relative to the conventional test without adaptive stages.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 1","pages":"126-147"},"PeriodicalIF":1.3,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45947771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimating Classification Accuracy and Consistency Indices for Multiple Measures with the Simple Structure MIRT Model","authors":"Seohee Park, Kyung Yong Kim, Won-Chan Lee","doi":"10.1111/jedm.12338","DOIUrl":"10.1111/jedm.12338","url":null,"abstract":"<p>Multiple measures, such as multiple content domains or multiple types of performance, are used in various testing programs to classify examinees for screening or selection. Despite the popular usages of multiple measures, there is little research on classification consistency and accuracy of multiple measures. Accordingly, this study introduces an approach to estimate classification consistency and accuracy indices for multiple measures under four possible decision rules: (1) complementary, (2) conjunctive, (3) compensatory, and (4) pairwise combinations of the three. The current study uses the IRT-recursive-based approach with the simple-structure multidimensional IRT model (SS-MIRT) to estimate the classification consistency and accuracy for multiple measures. Theoretical formulations of the four decision rules with a binary decision (Pass/Fail) are presented. The estimation procedures are illustrated using an empirical data example based on SS-MIRT. In addition, this study applies the estimation procedures to the unidimensional IRT (UIRT) context, considering that UIRT is practically used more. This application shows that the proposed procedure of classification consistency and accuracy could be used with a UIRT model for individual measures as an alternative method of SS-MIRT.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 1","pages":"106-125"},"PeriodicalIF":1.3,"publicationDate":"2022-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45264295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Latent Space Model for Process Data","authors":"Yi Chen, Jingru Zhang, Yi Yang, Young-Sun Lee","doi":"10.1111/jedm.12337","DOIUrl":"10.1111/jedm.12337","url":null,"abstract":"<p>The development of human-computer interactive items in educational assessments provides opportunities to extract useful process information for problem-solving. However, the complex, intensive, and noisy nature of process data makes it challenging to model with the traditional psychometric methods. Social network methods have been applied to visualize and analyze process data. Nonetheless, research about statistical modeling of process information using social network methods is still limited. This article explored the application of the latent space model (LSM) for analyzing process data in educational assessment. The adjacent matrix of transitions between actions was created based on the weighted and directed network of action sequences and related auxiliary information. Then, the adjacent matrix was modeled with LSM to identify the lower-dimensional latent positions of actions. Three applications based on the results from LSM were introduced: action clustering, error analysis, and performance measurement. The simulation study showed that LSM can cluster actions from the same problem-solving strategy and measure students’ performance by comparing their action sequences with the optimal strategy. Finally, we analyzed the empirical data from PISA 2012 as a real case scenario to illustrate how to use LSM.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 4","pages":"517-535"},"PeriodicalIF":1.3,"publicationDate":"2022-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42099226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}