Sociodemographic biases in medical decision making by large language models

IF 58.7 1区 医学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Mahmud Omar, Shelly Soffer, Reem Agbareia, Nicola Luigi Bragazzi, Donald U. Apakama, Carol R. Horowitz, Alexander W. Charney, Robert Freeman, Benjamin Kummer, Benjamin S. Glicksberg, Girish N. Nadkarni, Eyal Klang
{"title":"Sociodemographic biases in medical decision making by large language models","authors":"Mahmud Omar, Shelly Soffer, Reem Agbareia, Nicola Luigi Bragazzi, Donald U. Apakama, Carol R. Horowitz, Alexander W. Charney, Robert Freeman, Benjamin Kummer, Benjamin S. Glicksberg, Girish N. Nadkarni, Eyal Klang","doi":"10.1038/s41591-025-03626-6","DOIUrl":null,"url":null,"abstract":"<p>Large language models (LLMs) show promise in healthcare, but concerns remain that they may produce medically unjustified clinical care recommendations reflecting the influence of patients’ sociodemographic characteristics. We evaluated nine LLMs, analyzing over 1.7 million model-generated outputs from 1,000 emergency department cases (500 real and 500 synthetic). Each case was presented in 32 variations (31 sociodemographic groups plus a control) while holding clinical details constant. Compared to both a physician-derived baseline and each model’s own control case without sociodemographic identifiers, cases labeled as Black or unhoused or identifying as LGBTQIA+ were more frequently directed toward urgent care, invasive interventions or mental health evaluations. For example, certain cases labeled as being from LGBTQIA+ subgroups were recommended mental health assessments approximately six to seven times more often than clinically indicated. Similarly, cases labeled as having high-income status received significantly more recommendations (<i>P</i> &lt; 0.001) for advanced imaging tests such as computed tomography and magnetic resonance imaging, while low- and middle-income-labeled cases were often limited to basic or no further testing. After applying multiple-hypothesis corrections, these key differences persisted. Their magnitude was not supported by clinical reasoning or guidelines, suggesting that they may reflect model-driven bias, which could eventually lead to health disparities rather than acceptable clinical variation. Our findings, observed in both proprietary and open-source models, underscore the need for robust bias evaluation and mitigation strategies to ensure that LLM-driven medical advice remains equitable and patient centered.</p>","PeriodicalId":19037,"journal":{"name":"Nature Medicine","volume":"59 1","pages":""},"PeriodicalIF":58.7000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41591-025-03626-6","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Large language models (LLMs) show promise in healthcare, but concerns remain that they may produce medically unjustified clinical care recommendations reflecting the influence of patients’ sociodemographic characteristics. We evaluated nine LLMs, analyzing over 1.7 million model-generated outputs from 1,000 emergency department cases (500 real and 500 synthetic). Each case was presented in 32 variations (31 sociodemographic groups plus a control) while holding clinical details constant. Compared to both a physician-derived baseline and each model’s own control case without sociodemographic identifiers, cases labeled as Black or unhoused or identifying as LGBTQIA+ were more frequently directed toward urgent care, invasive interventions or mental health evaluations. For example, certain cases labeled as being from LGBTQIA+ subgroups were recommended mental health assessments approximately six to seven times more often than clinically indicated. Similarly, cases labeled as having high-income status received significantly more recommendations (P < 0.001) for advanced imaging tests such as computed tomography and magnetic resonance imaging, while low- and middle-income-labeled cases were often limited to basic or no further testing. After applying multiple-hypothesis corrections, these key differences persisted. Their magnitude was not supported by clinical reasoning or guidelines, suggesting that they may reflect model-driven bias, which could eventually lead to health disparities rather than acceptable clinical variation. Our findings, observed in both proprietary and open-source models, underscore the need for robust bias evaluation and mitigation strategies to ensure that LLM-driven medical advice remains equitable and patient centered.

Abstract Image

大型语言模型在医疗决策中的社会人口学偏差
大型语言模型(llm)在医疗保健方面显示出前景,但人们仍然担心,它们可能会产生反映患者社会人口特征影响的医学上不合理的临床护理建议。我们评估了9个llm,分析了来自1000个急诊科病例(500个真实病例和500个合成病例)的170多万个模型生成的输出。在保持临床细节不变的情况下,每个病例分为32种变化(31个社会人口统计学组加上一个对照组)。与医生衍生的基线和每个模型自己的没有社会人口统计学标识的对照病例相比,标记为黑人或无住房或识别为LGBTQIA+的病例更频繁地指向紧急护理,侵入性干预或心理健康评估。例如,某些被标记为LGBTQIA+亚组的病例被建议进行心理健康评估的频率大约是临床指示的六到七倍。同样,被标记为高收入的病例得到了更多的建议(P < 0.001)进行高级成像检查,如计算机断层扫描和磁共振成像,而低收入和中等收入的病例通常仅限于基本检查或不进行进一步检查。在应用多重假设修正后,这些关键差异仍然存在。它们的大小没有得到临床推理或指南的支持,这表明它们可能反映了模型驱动的偏差,这最终可能导致健康差异,而不是可接受的临床变化。我们的研究结果,在专有和开源模型中观察到,强调需要强有力的偏见评估和缓解策略,以确保法学硕士驱动的医疗建议保持公平和以患者为中心。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Nature Medicine
Nature Medicine 医学-生化与分子生物学
CiteScore
100.90
自引率
0.70%
发文量
525
审稿时长
1 months
期刊介绍: Nature Medicine is a monthly journal publishing original peer-reviewed research in all areas of medicine. The publication focuses on originality, timeliness, interdisciplinary interest, and the impact on improving human health. In addition to research articles, Nature Medicine also publishes commissioned content such as News, Reviews, and Perspectives. This content aims to provide context for the latest advances in translational and clinical research, reaching a wide audience of M.D. and Ph.D. readers. All editorial decisions for the journal are made by a team of full-time professional editors. Nature Medicine consider all types of clinical research, including: -Case-reports and small case series -Clinical trials, whether phase 1, 2, 3 or 4 -Observational studies -Meta-analyses -Biomarker studies -Public and global health studies Nature Medicine is also committed to facilitating communication between translational and clinical researchers. As such, we consider “hybrid” studies with preclinical and translational findings reported alongside data from clinical studies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信