Mia Gisselbaek, Mélanie Suppan, Laurens Minsart, Ekin Köselerli, Sheila Nainan Myatra, Idit Matot, Odmara L. Barreto Chang, Sarah Saxena, Joana Berger-Estilita
{"title":"Representation of intensivists’ race/ethnicity, sex, and age by artificial intelligence: a cross-sectional study of two text-to-image models","authors":"Mia Gisselbaek, Mélanie Suppan, Laurens Minsart, Ekin Köselerli, Sheila Nainan Myatra, Idit Matot, Odmara L. Barreto Chang, Sarah Saxena, Joana Berger-Estilita","doi":"10.1186/s13054-024-05134-4","DOIUrl":null,"url":null,"abstract":"Integrating artificial intelligence (AI) into intensive care practices can enhance patient care by providing real-time predictions and aiding clinical decisions. However, biases in AI models can undermine diversity, equity, and inclusion (DEI) efforts, particularly in visual representations of healthcare professionals. This work aims to examine the demographic representation of two AI text-to-image models, Midjourney and ChatGPT DALL-E 2, and assess their accuracy in depicting the demographic characteristics of intensivists. This cross-sectional study, conducted from May to July 2024, used demographic data from the USA workforce report (2022) and intensive care trainees (2021) to compare real-world intensivist demographics with images generated by two AI models, Midjourney v6.0 and ChatGPT 4.0 DALL-E 2. A total of 1,400 images were generated across ICU subspecialties, with outcomes being the comparison of sex, race/ethnicity, and age representation in AI-generated images to the actual workforce demographics. The AI models demonstrated noticeable biases when compared to the actual U.S. intensive care workforce data, notably overrepresenting White and young doctors. ChatGPT-DALL-E2 produced less female (17.3% vs 32.2%, p < 0.0001), more White (61% vs 55.1%, p = 0.002) and younger (53.3% vs 23.9%, p < 0.001) individuals. While Midjourney depicted more female (47.6% vs 32.2%, p < 0.001), more White (60.9% vs 55.1%, p = 0.003) and younger intensivist (49.3% vs 23.9%, p < 0.001). Substantial differences between the specialties within both models were observed. Finally when compared together, both models showed significant differences in the Portrayal of intensivists. Significant biases in AI images of intensivists generated by ChatGPT DALL-E 2 and Midjourney reflect broader cultural issues, potentially perpetuating stereotypes of healthcare worker within the society. This study highlights the need for an approach that ensures fairness, accountability, transparency, and ethics in AI applications for healthcare.","PeriodicalId":10811,"journal":{"name":"Critical Care","volume":null,"pages":null},"PeriodicalIF":8.8000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Critical Care","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13054-024-05134-4","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CRITICAL CARE MEDICINE","Score":null,"Total":0}
引用次数: 0
Abstract
Integrating artificial intelligence (AI) into intensive care practices can enhance patient care by providing real-time predictions and aiding clinical decisions. However, biases in AI models can undermine diversity, equity, and inclusion (DEI) efforts, particularly in visual representations of healthcare professionals. This work aims to examine the demographic representation of two AI text-to-image models, Midjourney and ChatGPT DALL-E 2, and assess their accuracy in depicting the demographic characteristics of intensivists. This cross-sectional study, conducted from May to July 2024, used demographic data from the USA workforce report (2022) and intensive care trainees (2021) to compare real-world intensivist demographics with images generated by two AI models, Midjourney v6.0 and ChatGPT 4.0 DALL-E 2. A total of 1,400 images were generated across ICU subspecialties, with outcomes being the comparison of sex, race/ethnicity, and age representation in AI-generated images to the actual workforce demographics. The AI models demonstrated noticeable biases when compared to the actual U.S. intensive care workforce data, notably overrepresenting White and young doctors. ChatGPT-DALL-E2 produced less female (17.3% vs 32.2%, p < 0.0001), more White (61% vs 55.1%, p = 0.002) and younger (53.3% vs 23.9%, p < 0.001) individuals. While Midjourney depicted more female (47.6% vs 32.2%, p < 0.001), more White (60.9% vs 55.1%, p = 0.003) and younger intensivist (49.3% vs 23.9%, p < 0.001). Substantial differences between the specialties within both models were observed. Finally when compared together, both models showed significant differences in the Portrayal of intensivists. Significant biases in AI images of intensivists generated by ChatGPT DALL-E 2 and Midjourney reflect broader cultural issues, potentially perpetuating stereotypes of healthcare worker within the society. This study highlights the need for an approach that ensures fairness, accountability, transparency, and ethics in AI applications for healthcare.
期刊介绍:
Critical Care is an esteemed international medical journal that undergoes a rigorous peer-review process to maintain its high quality standards. Its primary objective is to enhance the healthcare services offered to critically ill patients. To achieve this, the journal focuses on gathering, exchanging, disseminating, and endorsing evidence-based information that is highly relevant to intensivists. By doing so, Critical Care seeks to provide a thorough and inclusive examination of the intensive care field.