评估机器学习模型的公平性：在预测慢性病患者死亡率时使用匹配对应物对种族偏见的研究。

IF 4 2区医学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Biomedical Informatics Pub Date : 2024-06-13 DOI:10.1016/j.jbi.2024.104677

Yifei Wang , Liqin Wang , Zhengyang Zhou , John Laurentiev , Joshua R. Lakin , Li Zhou , Pengyu Hong

{"title":"评估机器学习模型的公平性：在预测慢性病患者死亡率时使用匹配对应物对种族偏见的研究。","authors":"Yifei Wang , Liqin Wang , Zhengyang Zhou , John Laurentiev , Joshua R. Lakin , Li Zhou , Pengyu Hong","doi":"10.1016/j.jbi.2024.104677","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><p>Existing approaches to fairness evaluation often overlook systematic differences in the social determinants of health, like demographics and socioeconomics, among comparison groups, potentially leading to inaccurate or even contradictory conclusions. This study aims to evaluate racial disparities in predicting mortality among patients with chronic diseases using a fairness detection method that considers systematic differences.</p></div><div><h3>Methods</h3><p>We created five datasets from Mass General Brigham’s electronic health records (EHR), each focusing on a different chronic condition: congestive heart failure (CHF), chronic kidney disease (CKD), chronic obstructive pulmonary disease (COPD), chronic liver disease (CLD), and dementia. For each dataset, we developed separate machine learning models to predict 1-year mortality and examined racial disparities by comparing prediction performances between Black and White individuals. We compared racial fairness evaluation between the overall Black and White individuals versus their counterparts who were Black and matched White individuals identified by propensity score matching, where the systematic differences were mitigated.</p></div><div><h3>Results</h3><p>We identified significant differences between Black and White individuals in age, gender, marital status, education level, smoking status, health insurance type, body mass index, and Charlson comorbidity index (<em>p</em>-value < 0.001). When examining matched Black and White subpopulations identified through propensity score matching, significant differences between particular covariates existed. We observed weaker significance levels in the CHF cohort for insurance type (<em>p</em> = 0.043), in the CKD cohort for insurance type (<em>p</em> = 0.005) and education level (<em>p</em> = 0.016), and in the dementia cohort for body mass index (<em>p</em> = 0.041); with no significant differences for other covariates. When examining mortality prediction models across the five study cohorts, we conducted a comparison of fairness evaluations before and after mitigating systematic differences. We revealed significant differences in the CHF cohort with <em>p</em>-values of 0.021 and 0.001 in terms of F1 measure and Sensitivity for the AdaBoost model, and <em>p</em>-values of 0.014 and 0.003 in terms of F1 measure and Sensitivity for the MLP model, respectively.</p></div><div><h3>Discussion and conclusion</h3><p>This study contributes to research on fairness assessment by focusing on the examination of systematic disparities and underscores the potential for revealing racial bias in machine learning models used in clinical settings.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"156 ","pages":"Article 104677"},"PeriodicalIF":4.0000,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing fairness in machine learning models: A study of racial bias using matched counterparts in mortality prediction for patients with chronic diseases\",\"authors\":\"Yifei Wang , Liqin Wang , Zhengyang Zhou , John Laurentiev , Joshua R. Lakin , Li Zhou , Pengyu Hong\",\"doi\":\"10.1016/j.jbi.2024.104677\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Objective</h3><p>Existing approaches to fairness evaluation often overlook systematic differences in the social determinants of health, like demographics and socioeconomics, among comparison groups, potentially leading to inaccurate or even contradictory conclusions. This study aims to evaluate racial disparities in predicting mortality among patients with chronic diseases using a fairness detection method that considers systematic differences.</p></div><div><h3>Methods</h3><p>We created five datasets from Mass General Brigham’s electronic health records (EHR), each focusing on a different chronic condition: congestive heart failure (CHF), chronic kidney disease (CKD), chronic obstructive pulmonary disease (COPD), chronic liver disease (CLD), and dementia. For each dataset, we developed separate machine learning models to predict 1-year mortality and examined racial disparities by comparing prediction performances between Black and White individuals. We compared racial fairness evaluation between the overall Black and White individuals versus their counterparts who were Black and matched White individuals identified by propensity score matching, where the systematic differences were mitigated.</p></div><div><h3>Results</h3><p>We identified significant differences between Black and White individuals in age, gender, marital status, education level, smoking status, health insurance type, body mass index, and Charlson comorbidity index (<em>p</em>-value < 0.001). When examining matched Black and White subpopulations identified through propensity score matching, significant differences between particular covariates existed. We observed weaker significance levels in the CHF cohort for insurance type (<em>p</em> = 0.043), in the CKD cohort for insurance type (<em>p</em> = 0.005) and education level (<em>p</em> = 0.016), and in the dementia cohort for body mass index (<em>p</em> = 0.041); with no significant differences for other covariates. When examining mortality prediction models across the five study cohorts, we conducted a comparison of fairness evaluations before and after mitigating systematic differences. We revealed significant differences in the CHF cohort with <em>p</em>-values of 0.021 and 0.001 in terms of F1 measure and Sensitivity for the AdaBoost model, and <em>p</em>-values of 0.014 and 0.003 in terms of F1 measure and Sensitivity for the MLP model, respectively.</p></div><div><h3>Discussion and conclusion</h3><p>This study contributes to research on fairness assessment by focusing on the examination of systematic disparities and underscores the potential for revealing racial bias in machine learning models used in clinical settings.</p></div>\",\"PeriodicalId\":15263,\"journal\":{\"name\":\"Journal of Biomedical Informatics\",\"volume\":\"156 \",\"pages\":\"Article 104677\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2024-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Biomedical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1532046424000959\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1532046424000959","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

目的：现有的公平性评估方法往往忽视了比较组之间在健康的社会决定因素（如人口统计学和社会经济学）方面的系统性差异，从而可能导致不准确甚至相互矛盾的结论。本研究旨在使用一种考虑系统性差异的公平性检测方法，评估在预测慢性病患者死亡率方面的种族差异：我们从麻省总医院布里格姆分院的电子健康记录（EHR）中创建了五个数据集，每个数据集侧重于不同的慢性疾病：充血性心力衰竭（CHF）、慢性肾病（CKD）、慢性阻塞性肺病（COPD）、慢性肝病（CLD）和痴呆症。对于每个数据集，我们都开发了单独的机器学习模型来预测 1 年死亡率，并通过比较黑人和白人的预测结果来研究种族差异。我们比较了整体黑人和白人与通过倾向得分匹配确定的黑人和匹配白人之间的种族公平性评价，其中系统性差异得到了缓解：结果：我们发现黑人和白人在年龄、性别、婚姻状况、教育程度、吸烟状况、医疗保险类型、体重指数和 Charlson 合并症指数（P 值讨论和结论：本研究通过重点检查系统性差异为公平性评估研究做出了贡献，并强调了在临床环境中使用的机器学习模型揭示种族偏见的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Assessing fairness in machine learning models: A study of racial bias using matched counterparts in mortality prediction for patients with chronic diseases

查看原文本刊更多论文

Assessing fairness in machine learning models: A study of racial bias using matched counterparts in mortality prediction for patients with chronic diseases

Objective

Existing approaches to fairness evaluation often overlook systematic differences in the social determinants of health, like demographics and socioeconomics, among comparison groups, potentially leading to inaccurate or even contradictory conclusions. This study aims to evaluate racial disparities in predicting mortality among patients with chronic diseases using a fairness detection method that considers systematic differences.

Methods

We created five datasets from Mass General Brigham’s electronic health records (EHR), each focusing on a different chronic condition: congestive heart failure (CHF), chronic kidney disease (CKD), chronic obstructive pulmonary disease (COPD), chronic liver disease (CLD), and dementia. For each dataset, we developed separate machine learning models to predict 1-year mortality and examined racial disparities by comparing prediction performances between Black and White individuals. We compared racial fairness evaluation between the overall Black and White individuals versus their counterparts who were Black and matched White individuals identified by propensity score matching, where the systematic differences were mitigated.

Results

We identified significant differences between Black and White individuals in age, gender, marital status, education level, smoking status, health insurance type, body mass index, and Charlson comorbidity index (p-value < 0.001). When examining matched Black and White subpopulations identified through propensity score matching, significant differences between particular covariates existed. We observed weaker significance levels in the CHF cohort for insurance type (p = 0.043), in the CKD cohort for insurance type (p = 0.005) and education level (p = 0.016), and in the dementia cohort for body mass index (p = 0.041); with no significant differences for other covariates. When examining mortality prediction models across the five study cohorts, we conducted a comparison of fairness evaluations before and after mitigating systematic differences. We revealed significant differences in the CHF cohort with p-values of 0.021 and 0.001 in terms of F1 measure and Sensitivity for the AdaBoost model, and p-values of 0.014 and 0.003 in terms of F1 measure and Sensitivity for the MLP model, respectively.

Discussion and conclusion

This study contributes to research on fairness assessment by focusing on the examination of systematic disparities and underscores the potential for revealing racial bias in machine learning models used in clinical settings.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Biomedical Informatics 医学-计算机：跨学科应用

CiteScore

8.90

自引率

6.70%

发文量

243

审稿时长

32 days

期刊介绍： The Journal of Biomedical Informatics reflects a commitment to high-quality original research papers, reviews, and commentaries in the area of biomedical informatics methodology. Although we publish articles motivated by applications in the biomedical sciences (for example, clinical medicine, health care, population health, and translational bioinformatics), the journal emphasizes reports of new methodologies and techniques that have general applicability and that form the basis for the evolving science of biomedical informatics. Articles on medical devices; evaluations of implemented systems (including clinical trials of information technologies); or papers that provide insight into a biological process, a specific disease, or treatment options would generally be more suitable for publication in other venues. Papers on applications of signal processing and image analysis are often more suitable for biomedical engineering journals or other informatics journals, although we do publish papers that emphasize the information management and knowledge representation/modeling issues that arise in the storage and use of biological signals and images. System descriptions are welcome if they illustrate and substantiate the underlying methodology that is the principal focus of the report and an effort is made to address the generalizability and/or range of application of that methodology. Note also that, given the international nature of JBI, papers that deal with specific languages other than English, or with country-specific health systems or approaches, are acceptable for JBI only if they offer generalizable lessons that are relevant to the broad JBI readership, regardless of their country, language, culture, or health system.