Conceptualizing bias in EHR data: A case study in performance disparities by demographic subgroups for a pediatric obesity incidence classifier.

PLOS digital health Pub Date : 2024-10-23 eCollection Date: 2024-10-01 DOI:10.1371/journal.pdig.0000642

Elizabeth A Campbell, Saurav Bose, Aaron J Masino

{"title":"Conceptualizing bias in EHR data: A case study in performance disparities by demographic subgroups for a pediatric obesity incidence classifier.","authors":"Elizabeth A Campbell, Saurav Bose, Aaron J Masino","doi":"10.1371/journal.pdig.0000642","DOIUrl":null,"url":null,"abstract":"<p><p>Electronic Health Records (EHRs) are increasingly used to develop machine learning models in predictive medicine. There has been limited research on utilizing machine learning methods to predict childhood obesity and related disparities in classifier performance among vulnerable patient subpopulations. In this work, classification models are developed to recognize pediatric obesity using temporal condition patterns obtained from patient EHR data in a U.S. study population. We trained four machine learning algorithms (Logistic Regression, Random Forest, Gradient Boosted Trees, and Neural Networks) to classify cases and controls as obesity positive or negative, and optimized hyperparameter settings through a bootstrapping methodology. To assess the classifiers for bias, we studied model performance by population subgroups then used permutation analysis to identify the most predictive features for each model and the demographic characteristics of patients with these features. Mean AUC-ROC values were consistent across classifiers, ranging from 0.72-0.80. Some evidence of bias was identified, although this was through the models performing better for minority subgroups (African Americans and patients enrolled in Medicaid). Permutation analysis revealed that patients from vulnerable population subgroups were over-represented among patients with the most predictive diagnostic patterns. We hypothesize that our models performed better on under-represented groups because the features more strongly associated with obesity were more commonly observed among minority patients. These findings highlight the complex ways that bias may arise in machine learning models and can be incorporated into future research to develop a thorough analytical approach to identify and mitigate bias that may arise from features and within EHR datasets when developing more equitable models.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"3 10","pages":"e0000642"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11498669/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000642","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Electronic Health Records (EHRs) are increasingly used to develop machine learning models in predictive medicine. There has been limited research on utilizing machine learning methods to predict childhood obesity and related disparities in classifier performance among vulnerable patient subpopulations. In this work, classification models are developed to recognize pediatric obesity using temporal condition patterns obtained from patient EHR data in a U.S. study population. We trained four machine learning algorithms (Logistic Regression, Random Forest, Gradient Boosted Trees, and Neural Networks) to classify cases and controls as obesity positive or negative, and optimized hyperparameter settings through a bootstrapping methodology. To assess the classifiers for bias, we studied model performance by population subgroups then used permutation analysis to identify the most predictive features for each model and the demographic characteristics of patients with these features. Mean AUC-ROC values were consistent across classifiers, ranging from 0.72-0.80. Some evidence of bias was identified, although this was through the models performing better for minority subgroups (African Americans and patients enrolled in Medicaid). Permutation analysis revealed that patients from vulnerable population subgroups were over-represented among patients with the most predictive diagnostic patterns. We hypothesize that our models performed better on under-represented groups because the features more strongly associated with obesity were more commonly observed among minority patients. These findings highlight the complex ways that bias may arise in machine learning models and can be incorporated into future research to develop a thorough analytical approach to identify and mitigate bias that may arise from features and within EHR datasets when developing more equitable models.

查看原文本刊更多论文

将电子病历数据中的偏差概念化：儿科肥胖症发病率分类器的人口亚群绩效差异案例研究。

电子健康记录（EHR）越来越多地被用于开发预测医学中的机器学习模型。在利用机器学习方法预测儿童肥胖症以及易受影响的患者亚群中分类器性能的相关差异方面，研究还很有限。在这项工作中，我们开发了分类模型，利用从美国研究人群的患者电子病历数据中获得的时间条件模式来识别小儿肥胖症。我们训练了四种机器学习算法（逻辑回归、随机森林、梯度提升树和神经网络），将病例和对照组划分为肥胖阳性或阴性，并通过引导方法优化了超参数设置。为了评估分类器的偏差，我们研究了不同人群亚群的模型性能，然后使用置换分析确定了每个模型最具预测性的特征以及具有这些特征的患者的人口统计学特征。不同分类器的平均 AUC-ROC 值一致，范围在 0.72-0.80 之间。发现了一些偏倚的证据，但这是通过模型对少数族裔亚群（非裔美国人和参加医疗补助的患者）的表现更好而发现的。置换分析表明，弱势人群亚群的患者在最具预测性诊断模式的患者中比例过高。我们假设，我们的模型在代表性不足的群体中表现更佳，因为在少数群体患者中更常观察到与肥胖关联更强的特征。这些发现凸显了机器学习模型中可能出现偏差的复杂方式，可将其纳入未来的研究中，以开发一种全面的分析方法，在开发更公平的模型时，识别并减轻可能来自特征和电子病历数据集的偏差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PLOS digital health

自引率

0.00%

发文量