{"title":"A Psychometric Framework for Evaluating Fairness in Algorithmic Decision Making: Differential Algorithmic Functioning","authors":"Youmi Suk, K. T. Han","doi":"10.3102/10769986231171711","DOIUrl":"https://doi.org/10.3102/10769986231171711","url":null,"abstract":"As algorithmic decision making is increasingly deployed in every walk of life, many researchers have raised concerns about fairness-related bias from such algorithms. But there is little research on harnessing psychometric methods to uncover potential discriminatory bias inside decision-making algorithms. The main goal of this article is to propose a new framework for algorithmic fairness based on differential item functioning (DIF), which has been commonly used to measure item fairness in psychometrics. Our fairness notion, which we call differential algorithmic functioning (DAF), is defined based on three pieces of information: a decision variable, a “fair” variable, and a protected variable such as race or gender. Under the DAF framework, an algorithm can exhibit uniform DAF, nonuniform DAF, or neither (i.e., non-DAF). For detecting DAF, we provide modifications of well-established DIF methods: Mantel–Haenszel test, logistic regression, and residual-based DIF. We demonstrate our framework through a real dataset concerning decision-making algorithms for grade retention in K–12 education in the United States.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43676822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Model Misspecification and Robustness of Observed-Score Test Equating Using Propensity Scores","authors":"G. Wallin, M. Wiberg","doi":"10.3102/10769986231161575","DOIUrl":"https://doi.org/10.3102/10769986231161575","url":null,"abstract":"This study explores the usefulness of covariates on equating test scores from nonequivalent test groups. The covariates are captured by an estimated propensity score, which is used as a proxy for latent ability to balance the test groups. The objective is to assess the sensitivity of the equated scores to various misspecifications in the propensity score model. The study assumes a parametric form of the propensity score and evaluates the effects of various misspecification scenarios on equating error. The results, based on both simulated and real testing data, show that (1) omitting an important covariate leads to biased estimates of the equated scores, (2) misspecifying a nonlinear relationship between the covariates and test scores increases the equating standard error in the tails of the score distributions, and (3) the equating estimators are robust against omitting a second-order term as well as using an incorrect link function in the propensity score estimation model. The findings demonstrate that auxiliary information is beneficial for test score equating in complex settings. However, it also sheds light on the challenge of making fair comparisons between nonequivalent test groups in the absence of common items. The study identifies scenarios, where equating performance is acceptable and problematic, provides practical guidelines, and identifies areas for further investigation.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"48 1","pages":"603 - 635"},"PeriodicalIF":2.4,"publicationDate":"2023-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43284852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling Item-Level Heterogeneous Treatment Effects With the Explanatory Item Response Model: Leveraging Large-Scale Online Assessments to Pinpoint the Impact of Educational Interventions","authors":"Josh Gilbert, James S. Kim, Luke W. Miratrix","doi":"10.3102/10769986231171710","DOIUrl":"https://doi.org/10.3102/10769986231171710","url":null,"abstract":"Analyses that reveal how treatment effects vary allow researchers, practitioners, and policymakers to better understand the efficacy of educational interventions. In practice, however, standard statistical methods for addressing heterogeneous treatment effects (HTE) fail to address the HTE that may exist within outcome measures. In this study, we present a novel application of the explanatory item response model (EIRM) for assessing what we term “item-level” HTE (IL-HTE), in which a unique treatment effect is estimated for each item in an assessment. Results from data simulation reveal that when IL-HTE is present but ignored in the model, standard errors can be underestimated and false positive rates can increase. We then apply the EIRM to assess the impact of a literacy intervention focused on promoting transfer in reading comprehension on a digital assessment delivered online to approximately 8,000 third-grade students. We demonstrate that allowing for IL-HTE can reveal treatment effects at the item-level masked by a null average treatment effect, and the EIRM can thus provide fine-grained information for researchers and policymakers on the potentially heterogeneous causal effects of educational interventions.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43185120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cognitive Diagnosis Testlet Model for Multiple-Choice Items","authors":"Lei Guo, Wenjie Zhou, Xiao Li","doi":"10.3102/10769986231165622","DOIUrl":"https://doi.org/10.3102/10769986231165622","url":null,"abstract":"The testlet design is very popular in educational and psychological assessments. This article proposes a new cognitive diagnosis model, the multiple-choice cognitive diagnostic testlet (MC-CDT) model for tests using testlets consisting of MC items. The MC-CDT model uses the original examinees’ responses to MC items instead of dichotomously scored data (i.e., correct or incorrect) to retain information of different distractors and thus enhance the MC items’ diagnostic power. The Markov chain Monte Carlo algorithm was adopted to calibrate the model using the WinBUGS software. Then, a thorough simulation study was conducted to evaluate the estimation accuracy for both item and examinee parameters in the MC-CDT model under various conditions. The results showed that the proposed MC-CDT model outperformed the traditional MC cognitive diagnostic model. Specifically, the MC-CDT model fits the testlet data better than the traditional model, while also fitting the data without testlets well. The findings of this empirical study show that the MC-CDT model fits real data better than the traditional model and that it can also provide testlet information.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47191461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Within-Group Approach to Ensemble Machine Learning Methods for Causal Inference in Multilevel Studies","authors":"Youmi Suk","doi":"10.3102/10769986231162096","DOIUrl":"https://doi.org/10.3102/10769986231162096","url":null,"abstract":"Machine learning (ML) methods for causal inference have gained popularity due to their flexibility to predict the outcome model and the propensity score. In this article, we provide a within-group approach for ML-based causal inference methods in order to robustly estimate average treatment effects in multilevel studies when there is cluster-level unmeasured confounding. We focus on one particular ML-based causal inference method based on the targeted maximum likelihood estimation (TMLE) with an ensemble learner called SuperLearner. Through our simulation studies, we observe that training TMLE within groups of similar clusters helps remove bias from cluster-level unmeasured confounders. Also, using within-group propensity scores estimated from fixed effects logistic regression increases the robustness of the proposed within-group TMLE method. Even if the propensity scores are partially misspecified, the within-group TMLE still produces robust ATE estimates due to double robustness with flexible modeling, unlike parametric-based inverse propensity weighting methods. We demonstrate our proposed methods and conduct sensitivity analyses against the number of groups and individual-level unmeasured confounding to evaluate the effect of taking an eighth-grade algebra course on math achievement in the Early Childhood Longitudinal Study.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48737730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Latent Transition Cognitive Diagnosis Model With Covariates: A Three-Step Approach","authors":"Qianru Liang, Jimmy de la Torre, N. Law","doi":"10.3102/10769986231163320","DOIUrl":"https://doi.org/10.3102/10769986231163320","url":null,"abstract":"To expand the use of cognitive diagnosis models (CDMs) to longitudinal assessments, this study proposes a bias-corrected three-step estimation approach for latent transition CDMs with covariates by integrating a general CDM and a latent transition model. The proposed method can be used to assess changes in attribute mastery status and attribute profiles and to evaluate the covariate effects on both the initial state and transition probabilities over time using latent (multinomial) logistic regression. Because stepwise approaches generally yield biased estimates, correction for classification error probabilities is considered in this study. The results of the simulation study showed that the proposed method yielded more accurate parameter estimates than the uncorrected approach. The use of the proposed method is also illustrated using a set of real data.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47630907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Diagnosing Primary Students’ Reading Progression: Is Cognitive Diagnostic Computerized Adaptive Testing the Way Forward?","authors":"Yan Li, Chao-Hsien Huang, Jia Liu","doi":"10.3102/10769986231160668","DOIUrl":"https://doi.org/10.3102/10769986231160668","url":null,"abstract":"Cognitive diagnostic computerized adaptive testing (CD-CAT) is a cutting-edge technology in educational measurement that targets at providing feedback on examinees’ strengths and weaknesses while increasing test accuracy and efficiency. To date, most CD-CAT studies have made methodological progress under simulated conditions, but little has applied CD-CAT to real educational assessment. The present study developed a Chinese reading comprehension item bank tapping into six validated reading attributes, with 195 items calibrated using data of 28,485 second to sixth graders and the item-level cognitive diagnostic models (CDMs). The measurement precision and efficiency of the reading CD-CAT system were compared and optimized in terms of crucial CD-CAT settings, including the CDMs for calibration, item selection methods, and termination rules. The study identified seven dominant reading attribute mastery profiles that stably exist across grades. These major clusters of readers and their variety with grade indicated some sort of reading developmental mechanisms that advance and deepen step by step at the primary school level. Results also suggested that compared to traditional linear tests, CD-CAT significantly improved the classification accuracy without imposing much testing burden. These findings may elucidate the multifaceted nature and possible learning paths of reading and raise the question of whether CD-CAT is applicable to other educational domains where there is a need to provide formative and fine-grained feedback but where there is a limited amount of test time.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"1 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41708221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kylie Gorney, James A. Wollack, S. Sinharay, Carol Eckerly
{"title":"Using Item Scores and Distractors to Detect Item Compromise and Preknowledge","authors":"Kylie Gorney, James A. Wollack, S. Sinharay, Carol Eckerly","doi":"10.3102/10769986231159923","DOIUrl":"https://doi.org/10.3102/10769986231159923","url":null,"abstract":"Any time examinees have had access to items and/or answers prior to taking a test, the fairness of the test and validity of test score interpretations are threatened. Therefore, there is a high demand for procedures to detect both compromised items (CI) and examinees with preknowledge (EWP). In this article, we develop a procedure that uses item scores and distractors to simultaneously detect CI and EWP. The false positive rate and true positive rate are evaluated for both items and examinees using detailed simulations. A real data example is also provided using data from an information technology certification exam.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"48 1","pages":"636 - 660"},"PeriodicalIF":2.4,"publicationDate":"2023-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49258978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Explicit Form With Continuous Attribute Profile of the Partial Mastery DINA Model","authors":"Tian Shu, Guanzhong Luo, Zhaosheng Luo, Xiaofeng Yu, Xiaojun Guo, Yujun Li","doi":"10.3102/10769986231159436","DOIUrl":"https://doi.org/10.3102/10769986231159436","url":null,"abstract":"Cognitive diagnosis models (CDMs) are the statistical framework for cognitive diagnostic assessment in education and psychology. They generally assume that subjects’ latent attributes are dichotomous—mastery or nonmastery, which seems quite deterministic. As an alternative to dichotomous attribute mastery, attention is drawn to the use of a continuous attribute mastery format in recent literature. To obtain subjects’ finer-grained attribute mastery for more precise diagnosis and guidance, an equivalent but more explicit form of the partial-mastery-deterministic inputs, noisy “and” gate (DINA) model (termed continuous attribute profile [CAP]-DINA form) is proposed in this article. Its parameters estimation algorithm based on this form using Bayesian techniques with Markov chain Monte Carlo algorithm is also presented. Two simulation studies are conducted then to explore its parameter recovery and model misspecification, and the results demonstrate that the CAP-DINA form performs robustly with satisfactory efficiency in these two aspects. A real data study of the English test also indicates it has a better model fit than DINA.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"48 1","pages":"573 - 602"},"PeriodicalIF":2.4,"publicationDate":"2023-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42823878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benjamin Lu, E. Ben-Michael, A. Feller, Luke W. Miratrix
{"title":"Is It Who You Are or Where You Are? Accounting for Compositional Differences in Cross-Site Treatment Effect Variation","authors":"Benjamin Lu, E. Ben-Michael, A. Feller, Luke W. Miratrix","doi":"10.3102/10769986231155427","DOIUrl":"https://doi.org/10.3102/10769986231155427","url":null,"abstract":"In multisite trials, learning about treatment effect variation across sites is critical for understanding where and for whom a program works. Unadjusted comparisons, however, capture “compositional” differences in the distributions of unit-level features as well as “contextual” differences in site-level features, including possible differences in program implementation. Our goal in this article is to adjust site-level estimates for differences in the distribution of observed unit-level features: If we can reweight (or “transport”) each site to have a common distribution of observed unit-level covariates, the remaining treatment effect variation captures contextual and unobserved compositional differences across sites. This allows us to make apples-to-apples comparisons across sites, parceling out the amount of cross-site effect variation explained by systematic differences in populations served. In this article, we develop a framework for transporting effects using approximate balancing weights, where the weights are chosen to directly optimize unit-level covariate balance between each site and the common target distribution. We first develop our approach for the general setting of transporting the effect of a single-site trial. We then extend our method to multisite trials, assess its performance via simulation, and use it to analyze a series of multisite trials of adult education and vocational training programs. In our application, we find that distributional differences are potentially masking cross-site variation. Our method is available in the balancer R package.","PeriodicalId":48001,"journal":{"name":"Journal of Educational and Behavioral Statistics","volume":"48 1","pages":"420 - 453"},"PeriodicalIF":2.4,"publicationDate":"2023-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43898401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}