Bias in medical AI: Implications for clinical decision-making.

PLOS digital health Pub Date : 2024-11-07 eCollection Date: 2024-11-01 DOI:10.1371/journal.pdig.0000651

James L Cross, Michael A Choma, John A Onofrey

{"title":"Bias in medical AI: Implications for clinical decision-making.","authors":"James L Cross, Michael A Choma, John A Onofrey","doi":"10.1371/journal.pdig.0000651","DOIUrl":null,"url":null,"abstract":"<p><p>Biases in medical artificial intelligence (AI) arise and compound throughout the AI lifecycle. These biases can have significant clinical consequences, especially in applications that involve clinical decision-making. Left unaddressed, biased medical AI can lead to substandard clinical decisions and the perpetuation and exacerbation of longstanding healthcare disparities. We discuss potential biases that can arise at different stages in the AI development pipeline and how they can affect AI algorithms and clinical decision-making. Bias can occur in data features and labels, model development and evaluation, deployment, and publication. Insufficient sample sizes for certain patient groups can result in suboptimal performance, algorithm underestimation, and clinically unmeaningful predictions. Missing patient findings can also produce biased model behavior, including capturable but nonrandomly missing data, such as diagnosis codes, and data that is not usually or not easily captured, such as social determinants of health. Expertly annotated labels used to train supervised learning models may reflect implicit cognitive biases or substandard care practices. Overreliance on performance metrics during model development may obscure bias and diminish a model's clinical utility. When applied to data outside the training cohort, model performance can deteriorate from previous validation and can do so differentially across subgroups. How end users interact with deployed solutions can introduce bias. Finally, where models are developed and published, and by whom, impacts the trajectories and priorities of future medical AI development. Solutions to mitigate bias must be implemented with care, which include the collection of large and diverse data sets, statistical debiasing methods, thorough model evaluation, emphasis on model interpretability, and standardized bias reporting and transparency requirements. Prior to real-world implementation in clinical settings, rigorous validation through clinical trials is critical to demonstrate unbiased application. Addressing biases across model development stages is crucial for ensuring all patients benefit equitably from the future of medical AI.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"3 11","pages":"e0000651"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11542778/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000651","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Biases in medical artificial intelligence (AI) arise and compound throughout the AI lifecycle. These biases can have significant clinical consequences, especially in applications that involve clinical decision-making. Left unaddressed, biased medical AI can lead to substandard clinical decisions and the perpetuation and exacerbation of longstanding healthcare disparities. We discuss potential biases that can arise at different stages in the AI development pipeline and how they can affect AI algorithms and clinical decision-making. Bias can occur in data features and labels, model development and evaluation, deployment, and publication. Insufficient sample sizes for certain patient groups can result in suboptimal performance, algorithm underestimation, and clinically unmeaningful predictions. Missing patient findings can also produce biased model behavior, including capturable but nonrandomly missing data, such as diagnosis codes, and data that is not usually or not easily captured, such as social determinants of health. Expertly annotated labels used to train supervised learning models may reflect implicit cognitive biases or substandard care practices. Overreliance on performance metrics during model development may obscure bias and diminish a model's clinical utility. When applied to data outside the training cohort, model performance can deteriorate from previous validation and can do so differentially across subgroups. How end users interact with deployed solutions can introduce bias. Finally, where models are developed and published, and by whom, impacts the trajectories and priorities of future medical AI development. Solutions to mitigate bias must be implemented with care, which include the collection of large and diverse data sets, statistical debiasing methods, thorough model evaluation, emphasis on model interpretability, and standardized bias reporting and transparency requirements. Prior to real-world implementation in clinical settings, rigorous validation through clinical trials is critical to demonstrate unbiased application. Addressing biases across model development stages is crucial for ensuring all patients benefit equitably from the future of medical AI.

查看原文本刊更多论文

医学人工智能中的偏见：对临床决策的影响。

医学人工智能（AI）中的偏差会在整个人工智能生命周期中出现并不断加剧。这些偏差会对临床产生重大影响，尤其是在涉及临床决策的应用中。如果不加以解决，带有偏见的医疗人工智能可能会导致不合格的临床决策，并使长期存在的医疗差距永久化和加剧。我们将讨论在人工智能开发管道的不同阶段可能出现的潜在偏差，以及这些偏差会如何影响人工智能算法和临床决策。偏见可能出现在数据特征和标签、模型开发和评估、部署和发布中。某些患者群体的样本量不足会导致性能不达标、算法估计不足以及临床预测无意义。缺失的患者研究结果也会导致模型行为出现偏差，包括诊断代码等可捕获但非随机缺失的数据，以及健康的社会决定因素等通常不会或不易捕获的数据。用于训练监督学习模型的专家注释标签可能会反映出隐含的认知偏差或不合标准的护理实践。在模型开发过程中过度依赖性能指标可能会掩盖偏差，降低模型的临床实用性。当应用到训练队列以外的数据时，模型的性能可能会比之前的验证结果更差，而且在不同的亚组中会有不同的表现。最终用户如何与已部署的解决方案互动也会产生偏差。最后，模型在哪里开发和发布，由谁开发和发布，都会影响未来医疗人工智能的发展轨迹和优先顺序。减轻偏倚的解决方案必须谨慎实施，其中包括收集大量多样的数据集、统计去伪存真方法、全面的模型评估、强调模型的可解释性以及标准化的偏倚报告和透明度要求。在临床环境中实际应用之前，通过临床试验进行严格验证对证明无偏见应用至关重要。要确保所有患者都能公平地受益于未来的医疗人工智能，解决模型开发阶段的偏差问题至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PLOS digital health

自引率

0.00%

发文量