Generative AI in Medicine: Pioneering Progress or Perpetuating Historical Inaccuracies? Cross-Sectional Study Evaluating Implicit Bias.

JMIR AI Pub Date : 2025-06-24 DOI:10.2196/56891
Philip Sutera, Rohini Bhatia, Timothy Lin, Leslie Chang, Andrea Brown, Reshma Jagsi
{"title":"Generative AI in Medicine: Pioneering Progress or Perpetuating Historical Inaccuracies? Cross-Sectional Study Evaluating Implicit Bias.","authors":"Philip Sutera, Rohini Bhatia, Timothy Lin, Leslie Chang, Andrea Brown, Reshma Jagsi","doi":"10.2196/56891","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Generative artificial intelligence (gAI) models, such as DALL-E 2, are promising tools that can generate novel images or artwork based on text input. However, caution is warranted, as these tools generate information based on historical data and are thus at risk of propagating past learned inequities. Women in medicine have routinely been underrepresented in academic and clinical medicine and the stereotype of a male physician persists.</p><p><strong>Objective: </strong>The primary objective is to evaluate implicit bias among gAI across medical specialties.</p><p><strong>Methods: </strong>To evaluate for potential implicit bias, 100 photographs for each medical specialty were generated using the gAI platform DALL-E2. For each specialty, DALL-E2 was queried with \"An American [specialty name].\" Our primary endpoint was to compare the gender distribution of gAI photos to the current distribution in the United States. Our secondary endpoint included evaluating the racial distribution. gAI photos were classified according to perceived gender and race based on a unanimous consensus among a diverse group of medical residents. The proportion of gAI women subjects was compared for each medical specialty to the most recent Association of American Medical Colleges report for physician workforce and active residents using χ2 analysis.</p><p><strong>Results: </strong>A total of 1900 photos across 19 medical specialties were generated. Compared to physician workforce data, AI significantly overrepresented women in 7/19 specialties and underrepresented women in 6/19 specialties. Women were significantly underrepresented compared to the physician workforce by 18%, 18%, and 27% in internal medicine, family medicine, and pediatrics, respectively. Compared to current residents, AI significantly underrepresented women in 12/19 specialties, ranging from 10% to 36%. Additionally, women represented <50% of the demographic for 17/19 specialties by gAI.</p><p><strong>Conclusions: </strong>gAI created a sample population of physicians that underrepresented women when compared to both the resident and active physician workforce. Steps must be taken to train datasets in order to represent the diversity of the incoming physician workforce.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e56891"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12223688/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/56891","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Generative artificial intelligence (gAI) models, such as DALL-E 2, are promising tools that can generate novel images or artwork based on text input. However, caution is warranted, as these tools generate information based on historical data and are thus at risk of propagating past learned inequities. Women in medicine have routinely been underrepresented in academic and clinical medicine and the stereotype of a male physician persists.

Objective: The primary objective is to evaluate implicit bias among gAI across medical specialties.

Methods: To evaluate for potential implicit bias, 100 photographs for each medical specialty were generated using the gAI platform DALL-E2. For each specialty, DALL-E2 was queried with "An American [specialty name]." Our primary endpoint was to compare the gender distribution of gAI photos to the current distribution in the United States. Our secondary endpoint included evaluating the racial distribution. gAI photos were classified according to perceived gender and race based on a unanimous consensus among a diverse group of medical residents. The proportion of gAI women subjects was compared for each medical specialty to the most recent Association of American Medical Colleges report for physician workforce and active residents using χ2 analysis.

Results: A total of 1900 photos across 19 medical specialties were generated. Compared to physician workforce data, AI significantly overrepresented women in 7/19 specialties and underrepresented women in 6/19 specialties. Women were significantly underrepresented compared to the physician workforce by 18%, 18%, and 27% in internal medicine, family medicine, and pediatrics, respectively. Compared to current residents, AI significantly underrepresented women in 12/19 specialties, ranging from 10% to 36%. Additionally, women represented <50% of the demographic for 17/19 specialties by gAI.

Conclusions: gAI created a sample population of physicians that underrepresented women when compared to both the resident and active physician workforce. Steps must be taken to train datasets in order to represent the diversity of the incoming physician workforce.

医学中的生成式人工智能:开创性的进步还是延续历史的不准确性?评估内隐偏倚的横断面研究。
背景:生成式人工智能(gAI)模型,如dall - e2,是一种很有前途的工具,可以根据文本输入生成新颖的图像或艺术品。然而,谨慎是有必要的,因为这些工具生成的信息是基于历史数据的,因此有传播过去学到的不平等的风险。医学领域的女性在学术和临床医学领域的代表性不足,男性医生的刻板印象仍然存在。目的:主要目的是评估跨医学专业gAI的内隐偏倚。方法:为了评估潜在的内隐偏倚,使用gAI平台DALL-E2为每个医学专业生成100张照片。对于每个专业,使用“美国[专业名称]”查询DALL-E2。我们的主要终点是比较gAI照片的性别分布与美国目前的分布。我们的次要终点包括评估种族分布。gAI照片根据感知到的性别和种族进行分类,这是基于不同医疗住院医师群体的一致共识。采用χ2分析,将每个医学专业的gAI女性受试者比例与最新的美国医学院协会关于医师劳动力和活跃住院医师的报告进行比较。结果:共生成19个医学专业的1900张照片。与医生劳动力数据相比,人工智能在7/19个专业中女性比例明显过高,在6/19个专业中女性比例不足。与医生相比,女性在内科、家庭医学和儿科的比例分别为18%、18%和27%。与目前的居民相比,人工智能在12/19个专业中的女性比例明显不足,从10%到36%不等。结论:与住院医师和在职医师相比,gAI创建了一个代表性不足的医生样本人群。必须采取步骤来训练数据集,以代表即将到来的医生队伍的多样性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信