{"title":"Circumventing construct-irrelevant variance in international assessments using cognitive diagnostic modeling: A curriculum-sensitive measure","authors":"","doi":"10.1016/j.stueduc.2024.101393","DOIUrl":null,"url":null,"abstract":"<div><p>International large-scale assessments such as TIMSS administer achievement tests that are based on an analysis of national curricula to compare student achievement across countries. The organizations that coordinate these studies use Rasch or more generalized item response theory (IRT) models in which all test items are assumed to measure a single latent ability. The test responses are then used to estimate this ability, and the ability scores are used to compare countries.</p><p>A central but yet-to-be-contested assumption of this approach is that the achievement tests measure an unobserved unidimensional continuous variable that is comparable across countries. One threat to this assumption is the fact that countries and even regions or school tracks within countries have different curricula. When seeking to fairly compare countries, it seems legitimate to account for the fact that applicable curricula differ and that some students may not have been taught the full test content yet. When seeking to fairly compare countries, it seems imperative to account for the fact that national curricula differ and that some countries may not have taught the full test content yet. Nevertheless, existing IRT-based rankings ignore such differences.</p><p>The present study proposes a direct method to deal with differing curricula and create a fair ranking of educational quality between countries. The new method compares countries solely on test content that has already been taught; it uses information on whether students have mastered skills taught in class or not and does not consider contents that have not been taught yet. Mastery is assessed via the deterministic-input, noisy, “and” gate (DINA) model, an interpretable and tractable cognitive diagnostic model. To illustrate the new method, we use data from TIMSS 1995 and compare it to the IRT-based scores published in the international study report. We find a mismatch between the TIMSS test contents and national curricula in all countries. At the same time, we observe a high correlation between the scores based on the new method and the conventional IRT scores. This finding underscores the robustness of the performance measures reported in TIMSS despite existing differences across national curricula.</p></div>","PeriodicalId":47539,"journal":{"name":"Studies in Educational Evaluation","volume":null,"pages":null},"PeriodicalIF":2.6000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in Educational Evaluation","FirstCategoryId":"95","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0191491X24000725","RegionNum":2,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}
引用次数: 0
Abstract
International large-scale assessments such as TIMSS administer achievement tests that are based on an analysis of national curricula to compare student achievement across countries. The organizations that coordinate these studies use Rasch or more generalized item response theory (IRT) models in which all test items are assumed to measure a single latent ability. The test responses are then used to estimate this ability, and the ability scores are used to compare countries.
A central but yet-to-be-contested assumption of this approach is that the achievement tests measure an unobserved unidimensional continuous variable that is comparable across countries. One threat to this assumption is the fact that countries and even regions or school tracks within countries have different curricula. When seeking to fairly compare countries, it seems legitimate to account for the fact that applicable curricula differ and that some students may not have been taught the full test content yet. When seeking to fairly compare countries, it seems imperative to account for the fact that national curricula differ and that some countries may not have taught the full test content yet. Nevertheless, existing IRT-based rankings ignore such differences.
The present study proposes a direct method to deal with differing curricula and create a fair ranking of educational quality between countries. The new method compares countries solely on test content that has already been taught; it uses information on whether students have mastered skills taught in class or not and does not consider contents that have not been taught yet. Mastery is assessed via the deterministic-input, noisy, “and” gate (DINA) model, an interpretable and tractable cognitive diagnostic model. To illustrate the new method, we use data from TIMSS 1995 and compare it to the IRT-based scores published in the international study report. We find a mismatch between the TIMSS test contents and national curricula in all countries. At the same time, we observe a high correlation between the scores based on the new method and the conventional IRT scores. This finding underscores the robustness of the performance measures reported in TIMSS despite existing differences across national curricula.
期刊介绍:
Studies in Educational Evaluation publishes original reports of evaluation studies. Four types of articles are published by the journal: (a) Empirical evaluation studies representing evaluation practice in educational systems around the world; (b) Theoretical reflections and empirical studies related to issues involved in the evaluation of educational programs, educational institutions, educational personnel and student assessment; (c) Articles summarizing the state-of-the-art concerning specific topics in evaluation in general or in a particular country or group of countries; (d) Book reviews and brief abstracts of evaluation studies.