Survival After Radical Cystectomy for Bladder Cancer: Development of a Fair Machine Learning Model.

IF 3.1 3区医学 Q2 MEDICAL INFORMATICS

JMIR Medical Informatics Pub Date : 2024-12-13 DOI:10.2196/63289

Samuel Carbunaru, Yassamin Neshatvar, Hyungrok Do, Katie Murray, Rajesh Ranganath, Madhur Nayan

{"title":"Survival After Radical Cystectomy for Bladder Cancer: Development of a Fair Machine Learning Model.","authors":"Samuel Carbunaru, Yassamin Neshatvar, Hyungrok Do, Katie Murray, Rajesh Ranganath, Madhur Nayan","doi":"10.2196/63289","DOIUrl":null,"url":null,"abstract":"Background: Prediction models based on machine learning (ML) methods are being increasingly developed and adopted in health care. However, these models may be prone to bias and considered unfair if they demonstrate variable performance in population subgroups. An unfair model is of particular concern in bladder cancer, where disparities have been identified in sex and racial subgroups.Objective: This study aims (1) to develop a ML model to predict survival after radical cystectomy for bladder cancer and evaluate for potential model bias in sex and racial subgroups; and (2) to compare algorithm unfairness mitigation techniques to improve model fairness.Methods: We trained and compared various ML classification algorithms to predict 5-year survival after radical cystectomy using the National Cancer Database. The primary model performance metric was the F1-score. The primary metric for model fairness was the equalized odds ratio (eOR). We compared 3 algorithm unfairness mitigation techniques to improve eOR.Results: We identified 16,481 patients; 23.1% (n=3800) were female, and 91.5% (n=15,080) were \"White,\" 5% (n=832) were \"Black,\" 2.3% (n=373) were \"Hispanic,\" and 1.2% (n=196) were \"Asian.\" The 5-year mortality rate was 75% (n=12,290). The best naive model was extreme gradient boosting (XGBoost), which had an F1-score of 0.860 and eOR of 0.619. All unfairness mitigation techniques increased the eOR, with correlation remover showing the highest increase and resulting in a final eOR of 0.750. This mitigated model had F1-scores of 0.86, 0.904, and 0.824 in the full, Black male, and Asian female test sets, respectively.Conclusions: The ML model predicting survival after radical cystectomy exhibited bias across sex and racial subgroups. By using algorithm unfairness mitigation techniques, we improved algorithmic fairness as measured by the eOR. Our study highlights the role of not only evaluating for model bias but also actively mitigating such disparities to ensure equitable health care delivery. We also deployed the first web-based fair ML model for predicting survival after radical cystectomy.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e63289"},"PeriodicalIF":3.1000,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11694706/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/63289","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Prediction models based on machine learning (ML) methods are being increasingly developed and adopted in health care. However, these models may be prone to bias and considered unfair if they demonstrate variable performance in population subgroups. An unfair model is of particular concern in bladder cancer, where disparities have been identified in sex and racial subgroups.

Objective: This study aims (1) to develop a ML model to predict survival after radical cystectomy for bladder cancer and evaluate for potential model bias in sex and racial subgroups; and (2) to compare algorithm unfairness mitigation techniques to improve model fairness.

Methods: We trained and compared various ML classification algorithms to predict 5-year survival after radical cystectomy using the National Cancer Database. The primary model performance metric was the F₁-score. The primary metric for model fairness was the equalized odds ratio (eOR). We compared 3 algorithm unfairness mitigation techniques to improve eOR.

Results: We identified 16,481 patients; 23.1% (n=3800) were female, and 91.5% (n=15,080) were "White," 5% (n=832) were "Black," 2.3% (n=373) were "Hispanic," and 1.2% (n=196) were "Asian." The 5-year mortality rate was 75% (n=12,290). The best naive model was extreme gradient boosting (XGBoost), which had an F₁-score of 0.860 and eOR of 0.619. All unfairness mitigation techniques increased the eOR, with correlation remover showing the highest increase and resulting in a final eOR of 0.750. This mitigated model had F₁-scores of 0.86, 0.904, and 0.824 in the full, Black male, and Asian female test sets, respectively.

Conclusions: The ML model predicting survival after radical cystectomy exhibited bias across sex and racial subgroups. By using algorithm unfairness mitigation techniques, we improved algorithmic fairness as measured by the eOR. Our study highlights the role of not only evaluating for model bias but also actively mitigating such disparities to ensure equitable health care delivery. We also deployed the first web-based fair ML model for predicting survival after radical cystectomy.

查看原文本刊更多论文

膀胱癌根治性切除术后的生存率：开发公平的机器学习模型

背景：基于机器学习（ML）方法的预测模型正被越来越多地开发和应用于医疗保健领域。然而，如果这些模型在人口亚群中表现出不同的性能，则可能容易产生偏差，并被认为是不公平的。不公平模型在膀胱癌中尤其令人担忧，因为在膀胱癌中已发现性别和种族亚群存在差异：本研究旨在：(1) 建立一个预测膀胱癌根治性膀胱切除术后生存率的 ML 模型，并评估性别和种族亚群中潜在的模型偏差；(2) 比较算法不公平性缓解技术，以提高模型的公平性：我们使用国家癌症数据库训练并比较了各种 ML 分类算法，以预测根治性膀胱切除术后的 5 年生存率。模型性能的主要指标是 F1 分数。模型公平性的主要指标是均衡几率比（eOR）。我们比较了 3 种算法不公平性缓解技术，以改善 eOR：我们确定了16481名患者，其中23.1%（n=3800）为女性，91.5%（n=15080）为 "白人"，5%（n=832）为 "黑人"，2.3%（n=373）为 "西班牙裔"，1.2%（n=196）为 "亚裔"。5 年死亡率为 75%（n=12,290）。最佳天真模型是极端梯度提升模型（XGBoost），其 F1 分数为 0.860，eOR 为 0.619。所有不公平缓解技术都提高了 eOR，其中相关去除技术的 eOR 提高幅度最大，最终达到 0.750。在完整测试集、黑人男性测试集和亚裔女性测试集中，该减轻模型的 F1 分数分别为 0.86、0.904 和 0.824：结论：预测根治性膀胱切除术后存活率的 ML 模型在不同性别和种族亚群中存在偏差。通过使用算法不公平缓解技术，我们改善了以eOR衡量的算法公平性。我们的研究强调，不仅要评估模型偏差，还要积极缓解这种差异，以确保医疗服务的公平性。我们还部署了首个基于网络的公平 ML 模型，用于预测根治性膀胱切除术后的存活率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

JMIR Medical Informatics Medicine-Health Informatics

CiteScore

7.90

自引率

3.10%

发文量

173

审稿时长

12 weeks

期刊介绍： JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.