Gender Disparities in Artificial Intelligence–Generated Images of Hospital Leadership in the United States

Mayo Clinic Proceedings. Digital health Pub Date : 2025-04-08 DOI:10.1016/j.mcpdig.2025.100218

Mia Gisselbaek MD , Joana Berger-Estilita MD, PhD , Laurens Minsart MD , Ekin Köselerli MD , Arnout Devos PhD , Francisco Maio Matos PhD , Odmara L. Barreto Chang MD, PhD , Peter Dieckmann PhD , Melanie Suppan MD , Sarah Saxena MD, PhD

{"title":"Gender Disparities in Artificial Intelligence–Generated Images of Hospital Leadership in the United States","authors":"Mia Gisselbaek MD , Joana Berger-Estilita MD, PhD , Laurens Minsart MD , Ekin Köselerli MD , Arnout Devos PhD , Francisco Maio Matos PhD , Odmara L. Barreto Chang MD, PhD , Peter Dieckmann PhD , Melanie Suppan MD , Sarah Saxena MD, PhD","doi":"10.1016/j.mcpdig.2025.100218","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><div>To evaluate demographic representation in artificial intelligence (AI)–generated images of hospital leadership roles and compare them with real-world data from US hospitals.</div></div><div><h3>Patients and Methods</h3><div>This cross-sectional study, conducted from October 1, 2024 to October 31, 2024, analyzed images generated by 3 AI text-to-image models: Midjourney 6.0, OpenAI ChatGPT DALL-E 3, and Google Gemini Imagen 3. Standardized prompts were used to create 1200 images representing 4 key leadership roles: chief executive officers, chief medical officers, chief nursing officers, and chief financial officers. Real-world demographic data from 4397 US hospitals showed that chief executive officers were 73.2% men; chief financial officers, 65.2% men; chief medical officers, 85.7% men; and chief nursing officers, 9.4% men (overall: 60.1% men). The primary outcome was gender representation, with secondary outcomes including race/ethnicity and age. Two independent reviewers assessed images, with interrater reliability evaluated using Cohen κ.</div></div><div><h3>Results</h3><div>Interrater agreement was high for gender (κ=0.998) and moderate for race/ethnicity (κ=0.670) and age (κ=0.605). DALL-E overrepresented men (86.5%) and White individuals (94.5%). Midjourney showed improved gender balance (69.5% men) but overrepresented White individuals (75.0%). Imagen achieved near gender parity (50.3% men) but remained predominantly White (51.5%). Statistically significant differences were observed across models and between models and real-world demographics.</div></div><div><h3>Conclusion</h3><div>Artificial intelligence text-to-image models reflect and amplify systemic biases, overrepresenting men and White leaders, while underrepresenting diversity. Ethical AI practices, including diverse training data sets and fairness-aware algorithms, are essential to ensure equitable representation in health care leadership.</div></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"3 2","pages":"Article 100218"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mayo Clinic Proceedings. Digital health","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949761225000252","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Objective

To evaluate demographic representation in artificial intelligence (AI)–generated images of hospital leadership roles and compare them with real-world data from US hospitals.

Patients and Methods

This cross-sectional study, conducted from October 1, 2024 to October 31, 2024, analyzed images generated by 3 AI text-to-image models: Midjourney 6.0, OpenAI ChatGPT DALL-E 3, and Google Gemini Imagen 3. Standardized prompts were used to create 1200 images representing 4 key leadership roles: chief executive officers, chief medical officers, chief nursing officers, and chief financial officers. Real-world demographic data from 4397 US hospitals showed that chief executive officers were 73.2% men; chief financial officers, 65.2% men; chief medical officers, 85.7% men; and chief nursing officers, 9.4% men (overall: 60.1% men). The primary outcome was gender representation, with secondary outcomes including race/ethnicity and age. Two independent reviewers assessed images, with interrater reliability evaluated using Cohen κ.

Results

Interrater agreement was high for gender (κ=0.998) and moderate for race/ethnicity (κ=0.670) and age (κ=0.605). DALL-E overrepresented men (86.5%) and White individuals (94.5%). Midjourney showed improved gender balance (69.5% men) but overrepresented White individuals (75.0%). Imagen achieved near gender parity (50.3% men) but remained predominantly White (51.5%). Statistically significant differences were observed across models and between models and real-world demographics.

Conclusion

Artificial intelligence text-to-image models reflect and amplify systemic biases, overrepresenting men and White leaders, while underrepresenting diversity. Ethical AI practices, including diverse training data sets and fairness-aware algorithms, are essential to ensure equitable representation in health care leadership.

查看原文本刊更多论文

美国人工智能生成的医院领导图像中的性别差异

目的评估人工智能（AI）生成的医院领导角色图像中的人口统计学代表性，并将其与来自美国医院的真实数据进行比较。患者和方法本横断面研究于2024年10月1日至2024年10月31日进行，分析了3种AI文本到图像模型生成的图像：Midjourney 6.0、OpenAI ChatGPT DALL-E 3和谷歌Gemini Imagen 3。使用标准化提示创建了1200个代表4个关键领导角色的图像：首席执行官、首席医疗官、首席护理官和首席财务官。来自美国4397家医院的真实人口统计数据显示，首席执行官中有73.2%是男性；首席财务官中，男性占65.2%；首席医务官，85.7%为男性；首席护理官中，9.4%是男性（总体：60.1%是男性）。主要结果是性别代表性，次要结果包括种族/民族和年龄。两名独立审稿人对图像进行评估，使用Cohen κ评估图像间信度。结果性别间的一致性较高（κ=0.998），种族/民族间的一致性中等（κ=0.670），年龄间的一致性中等（κ=0.605）。DALL-E在男性（86.5%）和白人（94.5%）中比例过高。中期显示性别平衡有所改善（69.5%为男性），但白人个体比例过高（75.0%）。Imagen几乎实现了性别平等（50.3%的男性），但仍以白人为主（51.5%）。在模型之间以及模型与现实世界人口统计数据之间观察到统计学上的显著差异。人工智能文本到图像模型反映并放大了系统性偏见，过度代表男性和白人领导者，而低估了多样性。道德人工智能实践，包括各种训练数据集和公平意识算法，对于确保卫生保健领导层的公平代表性至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Mayo Clinic Proceedings. Digital health Medicine and Dentistry (General), Health Informatics, Public Health and Health Policy

自引率

0.00%

发文量

审稿时长

47 days