{"title":"A Unified Comparison of IRT-Based Effect Sizes for DIF Investigations","authors":"R. Philip Chalmers","doi":"10.1111/jedm.12347","DOIUrl":"10.1111/jedm.12347","url":null,"abstract":"<p>Several marginal effect size (ES) statistics suitable for quantifying the magnitude of differential item functioning (DIF) have been proposed in the area of item response theory; for instance, the Differential Functioning of Items and Tests (DFIT) statistics, signed and unsigned item difference in the sample statistics (SIDS, UIDS, NSIDS, and NUIDS), the standardized indices of impact, and the differential response functioning (DRF) statistics. However, the relationship between these proposed statistics has not been fully discussed, particularly with respect to population parameter definitions and recovery performance across independent samples. To address these issues, this article provides a unified presentation of competing DIF ES definitions and estimators, and evaluates the recovery efficacy of these competing estimators using a set of Monte Carlo simulation experiments. Statistical and inferential properties of the estimators are discussed, as well as future areas of research in this model-based area of bias quantification.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 2","pages":"318-350"},"PeriodicalIF":1.3,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47360097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Statistical Test for the Detection of Item Compromise Combining Responses and Response Times","authors":"Wim J. van der Linden, Dmitry I. Belov","doi":"10.1111/jedm.12346","DOIUrl":"10.1111/jedm.12346","url":null,"abstract":"<p>A test of item compromise is presented which combines the test takers' responses and response times (RTs) into a statistic defined as the number of correct responses on the item for test takers with RTs flagged as suspicious. The test has null and alternative distributions belonging to the well-known family of compound binomial distributions, is simple to calculate, and has results that are easy to interpret. It also demonstrated nearly perfect power for the detection of compromise with no more than 10 test takers with preknowledge of the more difficult and discriminating items in a set of empirical examples. For the easier and less discriminating items, the presence of some 20 test takers with preknowledge still sufficed. A test based on the reverse statistic of the total time by test takers with responses flagged as suspicious may seem a natural alternative but misses the property of a monotone likelihood ratio necessary to decide between a test that should be left or right sided.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 2","pages":"235-254"},"PeriodicalIF":1.3,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12346","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47060232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fully Gibbs Sampling Algorithms for Bayesian Variable Selection in Latent Regression Models","authors":"Kazuhiro Yamaguchi, Jihong Zhang","doi":"10.1111/jedm.12348","DOIUrl":"https://doi.org/10.1111/jedm.12348","url":null,"abstract":"<p>This study proposed Gibbs sampling algorithms for variable selection in a latent regression model under a unidimensional two-parameter logistic item response theory model. Three types of shrinkage priors were employed to obtain shrinkage estimates: double-exponential (i.e., Laplace), horseshoe, and horseshoe+ priors. These shrinkage priors were compared to a uniform prior case in both simulation and real data analysis. The simulation study revealed that two types of horseshoe priors had a smaller root mean square errors and shorter 95% credible interval lengths than double-exponential or uniform priors. In addition, the horseshoe+ prior was slightly more stable than the horseshoe prior. The real data example successfully proved the utility of horseshoe and horseshoe+ priors in selecting effective predictive covariates for math achievement.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 2","pages":"202-234"},"PeriodicalIF":1.3,"publicationDate":"2022-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50154343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Factor Mixture Model for Item Responses and Certainty of Response Indices to Identify Student Knowledge Profiles","authors":"Chia-Wen Chen, Björn Andersson, Jinxin Zhu","doi":"10.1111/jedm.12344","DOIUrl":"10.1111/jedm.12344","url":null,"abstract":"<p>The certainty of response index (CRI) measures respondents' confidence level when answering an item. In conjunction with the answers to the items, previous studies have used descriptive statistics and arbitrary thresholds to identify student knowledge profiles with the CRIs. Whereas this approach overlooked the measurement error of the observed item responses and indices, we address this by proposing a factor mixture model that integrates a latent class model to detect student subgroups and a measurement model to control for student ability and confidence level. Applying the model to 773 seventh graders' responses to an algebra test, where some items were related to new material that had not been taught in class, we found two subgroups: (1) students who had high confidence in answering items involving the new material; and (2) students who had low confidence in answering items involving the new material but higher general self-confidence than the first group. We regressed the posterior probability of the group membership on gender, prior achievement, and preview behavior and found preview behavior a significant factor associated with the membership. Finally, we discussed the implications of the current study for teaching practices and future research.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 1","pages":"28-51"},"PeriodicalIF":1.3,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12344","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43460732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Betty Lanteigne, Christine Coombe, & James Dean Brown. 2021. Challenges in Language Testing around the World: Insights for language test users. Singapore: Springer, 2021, 129.99 € (hardcover), ISBN 978-981-33-4232-3 (eBook). xxiii + 553 pp. https://doi.org/10.1007/978-981-33-4232-3","authors":"Bahram Kazemian, Shafigeh Mohammadian","doi":"10.1111/jedm.12343","DOIUrl":"10.1111/jedm.12343","url":null,"abstract":"","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 4","pages":"536-544"},"PeriodicalIF":1.3,"publicationDate":"2022-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45401317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Item Scores and Distractors in Person-Fit Assessment","authors":"Kylie Gorney, James A. Wollack","doi":"10.1111/jedm.12345","DOIUrl":"10.1111/jedm.12345","url":null,"abstract":"<p>In order to detect a wide range of aberrant behaviors, it can be useful to incorporate information beyond the dichotomous item scores. In this paper, we extend the <math>\u0000 <semantics>\u0000 <msub>\u0000 <mi>l</mi>\u0000 <mi>z</mi>\u0000 </msub>\u0000 <annotation>$l_z$</annotation>\u0000 </semantics></math> and <math>\u0000 <semantics>\u0000 <msubsup>\u0000 <mi>l</mi>\u0000 <mi>z</mi>\u0000 <mo>∗</mo>\u0000 </msubsup>\u0000 <annotation>$l_z^*$</annotation>\u0000 </semantics></math> person-fit statistics so that unusual behavior in item scores and unusual behavior in item distractors can be used as indicators of aberrance. Through detailed simulations, we show that the new statistics are more powerful than existing statistics in detecting several types of aberrant behavior, and that they are able to control the Type I error rate in instances where the model does not exactly fit the data. A real data example is also provided to demonstrate the utility of the new statistics in an operational setting.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 1","pages":"3-27"},"PeriodicalIF":1.3,"publicationDate":"2022-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12345","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48816866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Bayesian Person-Fit Analysis Method Using Pivotal Discrepancy Measures","authors":"Adam Combs","doi":"10.1111/jedm.12342","DOIUrl":"10.1111/jedm.12342","url":null,"abstract":"<p>A common method of checking person-fit in Bayesian item response theory (IRT) is the posterior-predictive (PP) method. In recent years, more powerful approaches have been proposed that are based on resampling methods using the popular <math>\u0000 <semantics>\u0000 <msubsup>\u0000 <mi>L</mi>\u0000 <mi>z</mi>\u0000 <mo>∗</mo>\u0000 </msubsup>\u0000 <annotation>$L_{z}^{*}$</annotation>\u0000 </semantics></math> statistic. There has also been proposed a new Bayesian model checking method based on pivotal discrepancy measures (PDMs). A PDM <i>T</i> is a discrepancy measure that is a pivotal quantity with a known reference distribution. A posterior sample of <i>T</i> can be generated using standard Markov chain Monte Carlo output, and a <i>p</i>-value is obtained from probability bounds computed on order statistics of the sample. In this paper, we propose a general procedure to apply this PDM method to person-fit checking in IRT models. We illustrate this using the <math>\u0000 <semantics>\u0000 <msub>\u0000 <mi>L</mi>\u0000 <mi>z</mi>\u0000 </msub>\u0000 <annotation>$L_{z}$</annotation>\u0000 </semantics></math> and <math>\u0000 <semantics>\u0000 <msubsup>\u0000 <mi>L</mi>\u0000 <mi>z</mi>\u0000 <mo>∗</mo>\u0000 </msubsup>\u0000 <annotation>$L_{z}^{*}$</annotation>\u0000 </semantics></math> measures. Simulation studies are done comparing these with the PP method and one of the more recent resampling methods. The results show that the PDM method is more powerful than the PP method. Under certain conditions, it is more powerful than the resampling method, while in others, it is less. The PDM method is also applied to a real data set.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 1","pages":"52-75"},"PeriodicalIF":1.3,"publicationDate":"2022-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46358680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Several Variations of Simple-Structure MIRT Equating","authors":"Stella Y. Kim, Won-Chan Lee","doi":"10.1111/jedm.12341","DOIUrl":"10.1111/jedm.12341","url":null,"abstract":"<p>The current study proposed several variants of simple-structure multidimensional item response theory equating procedures. Four distinct sets of data were used to demonstrate feasibility of proposed equating methods for two different equating designs: a random groups design and a common-item nonequivalent groups design. Findings indicated some notable differences between the multidimensional and unidimensional approaches when data exhibited evidence for multidimensionality. In addition, some of the proposed methods were successful in providing equating results for both section-level and composite-level scores, which has not been achieved by most of the existing methodologies. The traditional method of using a set of quadrature points and weights for equating turned out to be computationally intensive, particularly for the data with higher dimensions. The study suggested an alternative way of using the Monte-Carlo approach for such data. This study also proposed a simple-structure true-score equating procedure that does not rely on a multivariate <i>observed</i>-score distribution.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 1","pages":"76-105"},"PeriodicalIF":1.3,"publicationDate":"2022-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12341","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49051834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Validity Arguments Meet Artificial Intelligence in Innovative Educational Assessment","authors":"David W. Dorsey, Hillary R. Michaels","doi":"10.1111/jedm.12331","DOIUrl":"https://doi.org/10.1111/jedm.12331","url":null,"abstract":"<p>We have dramatically advanced our ability to create rich, complex, and effective assessments across a range of uses through technology advancement. Artificial Intelligence (AI) enabled assessments represent one such area of advancement—one that has captured our collective interest and imagination. Scientists and practitioners within the domains of organizational and workforce assessment have increasingly used AI in assessment, and its use is now becoming more common in education. While these types of solutions offer their users the promise of efficiency, effectiveness, and a “wow factor,” users need to maintain high standards for validity and fairness in high stakes settings. Due to the complexity of some AI methods and tools, this requirement for adherence to standards may challenge our traditional approaches to building validity and fairness arguments. In this edition, we review what these challenges may look like as validity arguments meet AI in educational assessment domains. We specifically explore how AI impacts Evidence-Centered Design (ECD) and development from assessment concept and coding to scoring and reporting. We also present information on ways to ensure that bias is not built into these systems. Lastly, we discuss future horizons, many that are almost here, for maximizing what AI offers while minimizing negative effects on test takers and programs.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"59 3","pages":"267-271"},"PeriodicalIF":1.3,"publicationDate":"2022-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"137805809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Deterministic Gated Lognormal Response Time Model to Identify Examinees with Item Preknowledge","authors":"Murat Kasli, Cengiz Zopluoglu, Sarah L. Toton","doi":"10.1111/jedm.12340","DOIUrl":"https://doi.org/10.1111/jedm.12340","url":null,"abstract":"<p>Response times (RTs) have recently attracted a significant amount of attention in the literature as they may provide meaningful information about item preknowledge. In this study, a new model, the Deterministic Gated Lognormal Response Time (DG-LNRT) model, is proposed to identify examinees with item preknowledge using RTs. The proposed model was applied to two different data sets and performance was assessed with false-positive rates, true-positive rates, and precision. The results were compared with another recently proposed Z-statistic. Follow-up simulation studies were also conducted to examine model performance in settings similar to the real data sets. The results indicate that the proposed model is viable and can help detect item preknowledge under certain conditions. However, its performance is highly dependent on the correct specification of the compromised items.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"60 1","pages":"148-169"},"PeriodicalIF":1.3,"publicationDate":"2022-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50123901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}