Raising awareness of potential biases in medical machine learning: Experience from a Datathon.

PLOS digital health Pub Date : 2025-07-11 eCollection Date: 2025-07-01 DOI:10.1371/journal.pdig.0000932
Harry Hochheiser, Jesse Klug, Thomas Mathie, Tom J Pollard, Jesse D Raffa, Stephanie L Ballard, Evamarie A Conrad, Smitha Edakalavan, Allan Joseph, Nader Alnomasy, Sarah Nutman, Veronika Hill, Sumit Kapoor, Eddie Pérez Claudio, Olga V Kravchenko, Ruoting Li, Mehdi Nourelahi, Jenny Diaz, W Michael Taylor, Sydney R Rooney, Maeve Woeltje, Leo Anthony Celi, Christopher M Horvat
{"title":"Raising awareness of potential biases in medical machine learning: Experience from a Datathon.","authors":"Harry Hochheiser, Jesse Klug, Thomas Mathie, Tom J Pollard, Jesse D Raffa, Stephanie L Ballard, Evamarie A Conrad, Smitha Edakalavan, Allan Joseph, Nader Alnomasy, Sarah Nutman, Veronika Hill, Sumit Kapoor, Eddie Pérez Claudio, Olga V Kravchenko, Ruoting Li, Mehdi Nourelahi, Jenny Diaz, W Michael Taylor, Sydney R Rooney, Maeve Woeltje, Leo Anthony Celi, Christopher M Horvat","doi":"10.1371/journal.pdig.0000932","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To challenge clinicians and informaticians to learn about potential sources of bias in medical machine learning models through investigation of data and predictions from an open-source severity of illness score.</p><p><strong>Methods: </strong>Over a two-day period (total elapsed time approximately 28 hours), we conducted a datathon that challenged interdisciplinary teams to investigate potential sources of bias in the Global Open Source Severity of Illness Score. Teams were invited to develop hypotheses, to use tools of their choosing to identify potential sources of bias, and to provide a final report.</p><p><strong>Results: </strong>Five teams participated, three of which included both informaticians and clinicians. Most (4/5) used Python for analyses, the remaining team used R. Common analysis themes included relationship of the GOSSIS-1 prediction score with demographics and care related variables; relationships between demographics and outcomes; calibration and factors related to the context of care; and the impact of missingness. Representativeness of the population, differences in calibration and model performance among groups, and differences in performance across hospital settings were identified as possible sources of bias.</p><p><strong>Discussion: </strong>Datathons are a promising approach for challenging developers and users to explore questions relating to unrecognized biases in medical machine learning algorithms.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 7","pages":"e0000932"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12250157/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000932","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: To challenge clinicians and informaticians to learn about potential sources of bias in medical machine learning models through investigation of data and predictions from an open-source severity of illness score.

Methods: Over a two-day period (total elapsed time approximately 28 hours), we conducted a datathon that challenged interdisciplinary teams to investigate potential sources of bias in the Global Open Source Severity of Illness Score. Teams were invited to develop hypotheses, to use tools of their choosing to identify potential sources of bias, and to provide a final report.

Results: Five teams participated, three of which included both informaticians and clinicians. Most (4/5) used Python for analyses, the remaining team used R. Common analysis themes included relationship of the GOSSIS-1 prediction score with demographics and care related variables; relationships between demographics and outcomes; calibration and factors related to the context of care; and the impact of missingness. Representativeness of the population, differences in calibration and model performance among groups, and differences in performance across hospital settings were identified as possible sources of bias.

Discussion: Datathons are a promising approach for challenging developers and users to explore questions relating to unrecognized biases in medical machine learning algorithms.

提高对医疗机器学习中潜在偏见的认识:来自数据马拉松的经验。
目的:通过对开源疾病严重程度评分的数据和预测的调查,挑战临床医生和信息学家了解医疗机器学习模型中潜在的偏倚来源。方法:在为期两天的时间内(总时间约为28小时),我们进行了一次数据马拉松,挑战跨学科团队调查全球开源疾病严重程度评分的潜在偏倚来源。团队被邀请提出假设,使用他们选择的工具来识别潜在的偏见来源,并提供最终报告。结果:共有5个小组参与,其中3个小组包括信息学家和临床医生。大多数(4/5)使用Python进行分析,其余团队使用r进行分析。常见的分析主题包括gosiss -1预测评分与人口统计学和护理相关变量的关系;人口统计学与结果之间的关系;校准和与护理环境相关的因素;以及思念的影响。人群的代表性、组间校准和模型性能的差异以及医院环境之间的性能差异被确定为可能的偏倚来源。讨论:数据马拉松是一种很有前途的方法,可以挑战开发人员和用户,探索与医疗机器学习算法中未被识别的偏见相关的问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信