Elle Lett, Shakiba Shahbandegan, Yuval Barak-Corren, Andrew M Fine, William G La Cava
{"title":"Intersectional and Marginal Debiasing in Prediction Models for Emergency Admissions.","authors":"Elle Lett, Shakiba Shahbandegan, Yuval Barak-Corren, Andrew M Fine, William G La Cava","doi":"10.1001/jamanetworkopen.2025.12947","DOIUrl":null,"url":null,"abstract":"<p><strong>Importance: </strong>Fair clinical prediction models are crucial for achieving equitable health outcomes. Intersectionality has been applied to develop algorithms that address discrimination among intersections of protected attributes (eg, Black women rather than Black persons or women separately), yet most fair algorithms default to marginal debiasing, optimizing performance across simplified patient subgroups.</p><p><strong>Objective: </strong>To assess the extent to which simplifying patient subgroups during training is associated with intersectional subgroup performance in emergency department (ED) admission models.</p><p><strong>Design, setting, and participants: </strong>This prognostic study of admission prediction models used retrospective data from ED visits to Beth Israel Deaconess Medical Center Medical Information Mart for Intensive Care IV (MIMIC-IV; n = 160 016) from January 1, 2011, to December 31, 2019, and Boston Children's Hospital (BCH; n = 22 222) from June 1 through August 13, 2019. Statistical analysis was conducted from January 2022 to August 2024.</p><p><strong>Main outcomes and measures: </strong>The primary outcome was admission to an in-patient service. The accuracy of admission predictions among intersectional subgroups was measured under variations on model training with respect to optimizing for group level performance. Under different fairness definitions (calibration, error rate balance) and modeling methods (linear, nonlinear), overall performance and subgroup performance of marginal debiasing approaches were compared with intersectional debiasing approaches. Subgroups were defined by self-reported race and ethnicity and gender. Measures include area under the receiver operator characteristic curve (AUROC), area under the precision recall curve, subgroup calibration error, and false-negative rates.</p><p><strong>Results: </strong>The MIMIC-IV cohort included 160 016 visits (mean [SD] age, 53.0 [19.3] years; 57.4% female patients; 0.3% American Indian or Alaska Native patients, 3.7% Asian patients, 26.2% Black patients, 10.0% Hispanic or Latino patients, and 59.7% White patients; 29.5% admitted) and the BCH cohort included 22 222 visits (mean [SD] age, 8.2 [6.8] years; 52.1% male patients; 0.1% American Indian or Alaska Native patients, 4.0% Asian patients, 19.7% Black patients, 30.6% Hispanic or Latino patients, 0.2% Native Hawaiian or Pacific Islander patients, 37.7% White patients; 16.3% admitted). Among MIMIC-IV groups, intersectional debiasing was associated with a reduced subgroup calibration error from 0.083 to 0.065 (22.3%), while marginal fairness debiasing was associated with a reduced subgroup calibration error from 0.083 to 0.074 (11.3%; difference, 11.1%); among BCH groups, intersectional debiasing was associated with a reduced subgroup calibration error from 0.111 to 0.080 (28.3%), while marginal fairness debiasing was associated with a reduced subgroup calibration error from 0.111 to 0.086 (22.6%; difference, 5.7%). Among MIMIC-IV groups, intersectional debiasing was associated with lowered subgroup false-negative rates from 0.142 to 0.125 (11.9%), while marginal debiasing was associated with lowered subgroup false-negative rates from 0.142 to 0.132 (6.8%; difference, 5.1%). Fairness improvements did not decrease overall accuracy compared with baseline models (eg, MIMIC-IV: mean [SD] AUROC, 0.85 [0.00], both models). Intersectional debiasing was associated with lowered error rates in several intersectional subpopulations compared with other strategies.</p><p><strong>Conclusions and relevance: </strong>This study suggests that intersectional debiasing better mitigates performance disparities across intersecting groups than marginal debiasing for admission prediction. Intersectionally debiased models were associated with reduced group-specific errors without compromising overall accuracy. Clinical risk prediction models should consider incorporating intersectional debiasing into their development.</p>","PeriodicalId":14694,"journal":{"name":"JAMA Network Open","volume":"8 5","pages":"e2512947"},"PeriodicalIF":10.5000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12123471/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JAMA Network Open","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1001/jamanetworkopen.2025.12947","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Importance: Fair clinical prediction models are crucial for achieving equitable health outcomes. Intersectionality has been applied to develop algorithms that address discrimination among intersections of protected attributes (eg, Black women rather than Black persons or women separately), yet most fair algorithms default to marginal debiasing, optimizing performance across simplified patient subgroups.
Objective: To assess the extent to which simplifying patient subgroups during training is associated with intersectional subgroup performance in emergency department (ED) admission models.
Design, setting, and participants: This prognostic study of admission prediction models used retrospective data from ED visits to Beth Israel Deaconess Medical Center Medical Information Mart for Intensive Care IV (MIMIC-IV; n = 160 016) from January 1, 2011, to December 31, 2019, and Boston Children's Hospital (BCH; n = 22 222) from June 1 through August 13, 2019. Statistical analysis was conducted from January 2022 to August 2024.
Main outcomes and measures: The primary outcome was admission to an in-patient service. The accuracy of admission predictions among intersectional subgroups was measured under variations on model training with respect to optimizing for group level performance. Under different fairness definitions (calibration, error rate balance) and modeling methods (linear, nonlinear), overall performance and subgroup performance of marginal debiasing approaches were compared with intersectional debiasing approaches. Subgroups were defined by self-reported race and ethnicity and gender. Measures include area under the receiver operator characteristic curve (AUROC), area under the precision recall curve, subgroup calibration error, and false-negative rates.
Results: The MIMIC-IV cohort included 160 016 visits (mean [SD] age, 53.0 [19.3] years; 57.4% female patients; 0.3% American Indian or Alaska Native patients, 3.7% Asian patients, 26.2% Black patients, 10.0% Hispanic or Latino patients, and 59.7% White patients; 29.5% admitted) and the BCH cohort included 22 222 visits (mean [SD] age, 8.2 [6.8] years; 52.1% male patients; 0.1% American Indian or Alaska Native patients, 4.0% Asian patients, 19.7% Black patients, 30.6% Hispanic or Latino patients, 0.2% Native Hawaiian or Pacific Islander patients, 37.7% White patients; 16.3% admitted). Among MIMIC-IV groups, intersectional debiasing was associated with a reduced subgroup calibration error from 0.083 to 0.065 (22.3%), while marginal fairness debiasing was associated with a reduced subgroup calibration error from 0.083 to 0.074 (11.3%; difference, 11.1%); among BCH groups, intersectional debiasing was associated with a reduced subgroup calibration error from 0.111 to 0.080 (28.3%), while marginal fairness debiasing was associated with a reduced subgroup calibration error from 0.111 to 0.086 (22.6%; difference, 5.7%). Among MIMIC-IV groups, intersectional debiasing was associated with lowered subgroup false-negative rates from 0.142 to 0.125 (11.9%), while marginal debiasing was associated with lowered subgroup false-negative rates from 0.142 to 0.132 (6.8%; difference, 5.1%). Fairness improvements did not decrease overall accuracy compared with baseline models (eg, MIMIC-IV: mean [SD] AUROC, 0.85 [0.00], both models). Intersectional debiasing was associated with lowered error rates in several intersectional subpopulations compared with other strategies.
Conclusions and relevance: This study suggests that intersectional debiasing better mitigates performance disparities across intersecting groups than marginal debiasing for admission prediction. Intersectionally debiased models were associated with reduced group-specific errors without compromising overall accuracy. Clinical risk prediction models should consider incorporating intersectional debiasing into their development.
期刊介绍:
JAMA Network Open, a member of the esteemed JAMA Network, stands as an international, peer-reviewed, open-access general medical journal.The publication is dedicated to disseminating research across various health disciplines and countries, encompassing clinical care, innovation in health care, health policy, and global health.
JAMA Network Open caters to clinicians, investigators, and policymakers, providing a platform for valuable insights and advancements in the medical field. As part of the JAMA Network, a consortium of peer-reviewed general medical and specialty publications, JAMA Network Open contributes to the collective knowledge and understanding within the medical community.