确保人工智能生成的医学图像具有适当的代表性:解决肤色偏差的方法协议》。

JMIR AI Pub Date : 2024-11-27 DOI:10.2196/58275
Andrew O'Malley, Miriam Veenhuizen, Ayla Ahmed
{"title":"确保人工智能生成的医学图像具有适当的代表性:解决肤色偏差的方法协议》。","authors":"Andrew O'Malley, Miriam Veenhuizen, Ayla Ahmed","doi":"10.2196/58275","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>In medical education, particularly in anatomy and dermatology, generative artificial intelligence (AI) can be used to create customized illustrations. However, the underrepresentation of darker skin tones in medical textbooks and elsewhere, which serve as training data for AI, poses a significant challenge in ensuring diverse and inclusive educational materials.</p><p><strong>Objective: </strong>This study aims to evaluate the extent of skin tone diversity in AI-generated medical images and to test whether the representation of skin tones can be improved by modifying AI prompts to better reflect the demographic makeup of the US population.</p><p><strong>Methods: </strong>In total, 2 standard AI models (Dall-E [OpenAI] and Midjourney [Midjourney Inc]) each generated 100 images of people with psoriasis. In addition, a custom model was developed that incorporated a prompt injection aimed at \"forcing\" the AI (Dall-E 3) to reflect the skin tone distribution of the US population according to the 2012 American National Election Survey. This custom model generated another set of 100 images. The skin tones in these images were assessed by 3 researchers using the New Immigrant Survey skin tone scale, with the median value representing each image. A chi-square goodness of fit analysis compared the skin tone distributions from each set of images to that of the US population.</p><p><strong>Results: </strong>The standard AI models (Dalle-3 and Midjourney) demonstrated a significant difference between the expected skin tones of the US population and the observed tones in the generated images (P<.001). Both standard AI models overrepresented lighter skin. Conversely, the custom model with the modified prompt yielded a distribution of skin tones that closely matched the expected demographic representation, showing no significant difference (P=.04).</p><p><strong>Conclusions: </strong>This study reveals a notable bias in AI-generated medical images, predominantly underrepresenting darker skin tones. This bias can be effectively addressed by modifying AI prompts to incorporate real-life demographic distributions. The findings emphasize the need for conscious efforts in AI development to ensure diverse and representative outputs, particularly in educational and medical contexts. Users of generative AI tools should be aware that these biases exist, and that similar tendencies may also exist in other types of generative AI (eg, large language models) and in other characteristics (eg, sex, gender, culture, and ethnicity). Injecting demographic data into AI prompts may effectively counteract these biases, ensuring a more accurate representation of the general population.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e58275"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11635324/pdf/","citationCount":"0","resultStr":"{\"title\":\"Ensuring Appropriate Representation in Artificial Intelligence-Generated Medical Imagery: Protocol for a Methodological Approach to Address Skin Tone Bias.\",\"authors\":\"Andrew O'Malley, Miriam Veenhuizen, Ayla Ahmed\",\"doi\":\"10.2196/58275\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>In medical education, particularly in anatomy and dermatology, generative artificial intelligence (AI) can be used to create customized illustrations. However, the underrepresentation of darker skin tones in medical textbooks and elsewhere, which serve as training data for AI, poses a significant challenge in ensuring diverse and inclusive educational materials.</p><p><strong>Objective: </strong>This study aims to evaluate the extent of skin tone diversity in AI-generated medical images and to test whether the representation of skin tones can be improved by modifying AI prompts to better reflect the demographic makeup of the US population.</p><p><strong>Methods: </strong>In total, 2 standard AI models (Dall-E [OpenAI] and Midjourney [Midjourney Inc]) each generated 100 images of people with psoriasis. In addition, a custom model was developed that incorporated a prompt injection aimed at \\\"forcing\\\" the AI (Dall-E 3) to reflect the skin tone distribution of the US population according to the 2012 American National Election Survey. This custom model generated another set of 100 images. The skin tones in these images were assessed by 3 researchers using the New Immigrant Survey skin tone scale, with the median value representing each image. A chi-square goodness of fit analysis compared the skin tone distributions from each set of images to that of the US population.</p><p><strong>Results: </strong>The standard AI models (Dalle-3 and Midjourney) demonstrated a significant difference between the expected skin tones of the US population and the observed tones in the generated images (P<.001). Both standard AI models overrepresented lighter skin. Conversely, the custom model with the modified prompt yielded a distribution of skin tones that closely matched the expected demographic representation, showing no significant difference (P=.04).</p><p><strong>Conclusions: </strong>This study reveals a notable bias in AI-generated medical images, predominantly underrepresenting darker skin tones. This bias can be effectively addressed by modifying AI prompts to incorporate real-life demographic distributions. The findings emphasize the need for conscious efforts in AI development to ensure diverse and representative outputs, particularly in educational and medical contexts. Users of generative AI tools should be aware that these biases exist, and that similar tendencies may also exist in other types of generative AI (eg, large language models) and in other characteristics (eg, sex, gender, culture, and ethnicity). Injecting demographic data into AI prompts may effectively counteract these biases, ensuring a more accurate representation of the general population.</p>\",\"PeriodicalId\":73551,\"journal\":{\"name\":\"JMIR AI\",\"volume\":\"3 \",\"pages\":\"e58275\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-11-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11635324/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JMIR AI\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2196/58275\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/58275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景:在医学教育中,尤其是在解剖学和皮肤病学中,生成式人工智能(AI)可用于创建定制插图。然而,在作为人工智能训练数据的医学教科书和其他书籍中,深肤色的代表性不足,这对确保教育材料的多样性和包容性构成了巨大挑战:本研究旨在评估人工智能生成的医学图像中肤色多样性的程度,并测试是否可以通过修改人工智能提示来改善肤色的代表性,从而更好地反映美国人口的构成:总共有两个标准人工智能模型(Dall-E [OpenAI] 和 Midjourney [Midjourney Inc])各生成了 100 张银屑病患者的图像。此外,我们还开发了一个自定义模型,该模型包含一个提示注射,旨在 "强制 "人工智能(Dall-E 3)根据 2012 年美国全国选举调查反映美国人口的肤色分布。该自定义模型生成了另外 100 张图像。这些图像中的肤色由 3 名研究人员使用新移民调查肤色量表进行评估,中值代表每张图像。通过卡方拟合优度分析,将每组图像的肤色分布与美国人口的肤色分布进行了比较:结果:标准人工智能模型(Dalle-3 和 Midjourney)显示,美国人口的预期肤色与生成图像中观察到的肤色之间存在显著差异(PC 结论:这项研究揭示了人工智能生成的医学图像存在明显偏差,主要是对深肤色代表不足。通过修改人工智能提示,将现实生活中的人口分布纳入其中,可以有效解决这一偏差。研究结果强调,在开发人工智能时需要有意识地确保输出结果的多样性和代表性,尤其是在教育和医疗领域。生成式人工智能工具的用户应该意识到这些偏见的存在,而且其他类型的生成式人工智能(如大型语言模型)和其他特征(如性、性别、文化和种族)也可能存在类似的倾向。在人工智能提示中注入人口统计数据可以有效抵消这些偏见,确保更准确地代表普通人群。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Ensuring Appropriate Representation in Artificial Intelligence-Generated Medical Imagery: Protocol for a Methodological Approach to Address Skin Tone Bias.

Background: In medical education, particularly in anatomy and dermatology, generative artificial intelligence (AI) can be used to create customized illustrations. However, the underrepresentation of darker skin tones in medical textbooks and elsewhere, which serve as training data for AI, poses a significant challenge in ensuring diverse and inclusive educational materials.

Objective: This study aims to evaluate the extent of skin tone diversity in AI-generated medical images and to test whether the representation of skin tones can be improved by modifying AI prompts to better reflect the demographic makeup of the US population.

Methods: In total, 2 standard AI models (Dall-E [OpenAI] and Midjourney [Midjourney Inc]) each generated 100 images of people with psoriasis. In addition, a custom model was developed that incorporated a prompt injection aimed at "forcing" the AI (Dall-E 3) to reflect the skin tone distribution of the US population according to the 2012 American National Election Survey. This custom model generated another set of 100 images. The skin tones in these images were assessed by 3 researchers using the New Immigrant Survey skin tone scale, with the median value representing each image. A chi-square goodness of fit analysis compared the skin tone distributions from each set of images to that of the US population.

Results: The standard AI models (Dalle-3 and Midjourney) demonstrated a significant difference between the expected skin tones of the US population and the observed tones in the generated images (P<.001). Both standard AI models overrepresented lighter skin. Conversely, the custom model with the modified prompt yielded a distribution of skin tones that closely matched the expected demographic representation, showing no significant difference (P=.04).

Conclusions: This study reveals a notable bias in AI-generated medical images, predominantly underrepresenting darker skin tones. This bias can be effectively addressed by modifying AI prompts to incorporate real-life demographic distributions. The findings emphasize the need for conscious efforts in AI development to ensure diverse and representative outputs, particularly in educational and medical contexts. Users of generative AI tools should be aware that these biases exist, and that similar tendencies may also exist in other types of generative AI (eg, large language models) and in other characteristics (eg, sex, gender, culture, and ethnicity). Injecting demographic data into AI prompts may effectively counteract these biases, ensuring a more accurate representation of the general population.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信