大型语言模型的固有偏差：随机抽样分析

Mayo Clinic Proceedings. Digital health Pub Date : 2024-04-11 DOI:10.1016/j.mcpdig.2024.03.003

Noel F. Ayoub MD, MBA , Karthik Balakrishnan MD, MPH , Marc S. Ayoub MD , Thomas F. Barrett MD , Abel P. David MD , Stacey T. Gray MD

{"title":"大型语言模型的固有偏差：随机抽样分析","authors":"Noel F. Ayoub MD, MBA , Karthik Balakrishnan MD, MPH , Marc S. Ayoub MD , Thomas F. Barrett MD , Abel P. David MD , Stacey T. Gray MD","doi":"10.1016/j.mcpdig.2024.03.003","DOIUrl":null,"url":null,"abstract":"<div><p>There are mounting concerns regarding inherent bias, safety, and tendency toward misinformation of large language models (LLMs), which could have significant implications in health care. This study sought to determine whether generative artificial intelligence (AI)-based simulations of physicians making life-and-death decisions in a resource-scarce environment would demonstrate bias. Thirteen questions were developed that simulated physicians treating patients in resource-limited environments. Through a random sampling of simulated physicians using OpenAI’s generative pretrained transformer (GPT-4), physicians were tasked with choosing only 1 patient to save owing to limited resources. This simulation was repeated 1000 times per question, representing 1000 unique physicians and patients each. Patients and physicians spanned a variety of demographic characteristics. All patients had similar a priori likelihood of surviving the acute illness. Overall, simulated physicians consistently demonstrated racial, gender, age, political affiliation, and sexual orientation bias in clinical decision-making. Across all demographic characteristics, physicians most frequently favored patients with similar demographic characteristics as themselves, with most pairwise comparisons showing statistical significance (<em>P</em><.05). Nondescript physicians favored White, male, and young demographic characteristics. The male doctor gravitated toward the male, White, and young, whereas the female doctor typically preferred female, young, and White patients. In addition to saving patients with their own political affiliation, Democratic physicians favored Black and female patients, whereas Republicans preferred White and male demographic characteristics. Heterosexual and gay/lesbian physicians frequently saved patients of similar sexual orientation. Overall, publicly available chatbot LLMs demonstrate significant biases, which may negatively impact patient outcomes if used to support clinical care decisions without appropriate precautions.</p></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"2 2","pages":"Pages 186-191"},"PeriodicalIF":0.0000,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949761224000208/pdfft?md5=895559f96cdc78e7afbad43c7d8d164a&pid=1-s2.0-S2949761224000208-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Inherent Bias in Large Language Models: A Random Sampling Analysis\",\"authors\":\"Noel F. Ayoub MD, MBA , Karthik Balakrishnan MD, MPH , Marc S. Ayoub MD , Thomas F. Barrett MD , Abel P. David MD , Stacey T. Gray MD\",\"doi\":\"10.1016/j.mcpdig.2024.03.003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>There are mounting concerns regarding inherent bias, safety, and tendency toward misinformation of large language models (LLMs), which could have significant implications in health care. This study sought to determine whether generative artificial intelligence (AI)-based simulations of physicians making life-and-death decisions in a resource-scarce environment would demonstrate bias. Thirteen questions were developed that simulated physicians treating patients in resource-limited environments. Through a random sampling of simulated physicians using OpenAI’s generative pretrained transformer (GPT-4), physicians were tasked with choosing only 1 patient to save owing to limited resources. This simulation was repeated 1000 times per question, representing 1000 unique physicians and patients each. Patients and physicians spanned a variety of demographic characteristics. All patients had similar a priori likelihood of surviving the acute illness. Overall, simulated physicians consistently demonstrated racial, gender, age, political affiliation, and sexual orientation bias in clinical decision-making. Across all demographic characteristics, physicians most frequently favored patients with similar demographic characteristics as themselves, with most pairwise comparisons showing statistical significance (<em>P</em><.05). Nondescript physicians favored White, male, and young demographic characteristics. The male doctor gravitated toward the male, White, and young, whereas the female doctor typically preferred female, young, and White patients. In addition to saving patients with their own political affiliation, Democratic physicians favored Black and female patients, whereas Republicans preferred White and male demographic characteristics. Heterosexual and gay/lesbian physicians frequently saved patients of similar sexual orientation. Overall, publicly available chatbot LLMs demonstrate significant biases, which may negatively impact patient outcomes if used to support clinical care decisions without appropriate precautions.</p></div>\",\"PeriodicalId\":74127,\"journal\":{\"name\":\"Mayo Clinic Proceedings. Digital health\",\"volume\":\"2 2\",\"pages\":\"Pages 186-191\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2949761224000208/pdfft?md5=895559f96cdc78e7afbad43c7d8d164a&pid=1-s2.0-S2949761224000208-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mayo Clinic Proceedings. Digital health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949761224000208\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mayo Clinic Proceedings. Digital health","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949761224000208","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

人们越来越关注大型语言模型（LLM）的内在偏差、安全性和误导倾向，这可能会对医疗保健产生重大影响。本研究试图确定，在资源匮乏的环境中，基于人工智能（AI）生成的模拟医生生死决策是否会出现偏差。研究人员提出了 13 个问题，模拟医生在资源有限的环境中治疗病人。通过使用 OpenAI 的生成式预训练转换器（GPT-4）对模拟医生进行随机抽样，医生的任务是在资源有限的情况下只选择救治一名病人。每个问题重复模拟 1000 次，每个问题代表 1000 个不同的医生和患者。患者和医生的人口统计学特征各不相同。所有患者在急性病中存活的先验可能性相似。总体而言，模拟医生在临床决策中始终表现出种族、性别、年龄、政治派别和性取向偏见。在所有人口统计学特征中，医生最倾向于选择与自己人口统计学特征相似的患者，大多数配对比较结果显示出统计学意义（P<.05）。无特征的医生偏爱白人、男性和年轻的人口特征。男医生偏爱男性、白人和年轻患者，而女医生通常偏爱女性、年轻和白人患者。民主党医生除了喜欢自己政治派别的病人外，还偏爱黑人和女性病人，而共和党医生则偏爱白人和男性人口特征。异性恋和男同性恋/女同性恋医生经常救治性取向相似的病人。总的来说，公开可用的聊天机器人 LLM 显示出明显的偏见，如果不采取适当的预防措施将其用于支持临床护理决策，可能会对患者的治疗效果产生负面影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Inherent Bias in Large Language Models: A Random Sampling Analysis

There are mounting concerns regarding inherent bias, safety, and tendency toward misinformation of large language models (LLMs), which could have significant implications in health care. This study sought to determine whether generative artificial intelligence (AI)-based simulations of physicians making life-and-death decisions in a resource-scarce environment would demonstrate bias. Thirteen questions were developed that simulated physicians treating patients in resource-limited environments. Through a random sampling of simulated physicians using OpenAI’s generative pretrained transformer (GPT-4), physicians were tasked with choosing only 1 patient to save owing to limited resources. This simulation was repeated 1000 times per question, representing 1000 unique physicians and patients each. Patients and physicians spanned a variety of demographic characteristics. All patients had similar a priori likelihood of surviving the acute illness. Overall, simulated physicians consistently demonstrated racial, gender, age, political affiliation, and sexual orientation bias in clinical decision-making. Across all demographic characteristics, physicians most frequently favored patients with similar demographic characteristics as themselves, with most pairwise comparisons showing statistical significance (P<.05). Nondescript physicians favored White, male, and young demographic characteristics. The male doctor gravitated toward the male, White, and young, whereas the female doctor typically preferred female, young, and White patients. In addition to saving patients with their own political affiliation, Democratic physicians favored Black and female patients, whereas Republicans preferred White and male demographic characteristics. Heterosexual and gay/lesbian physicians frequently saved patients of similar sexual orientation. Overall, publicly available chatbot LLMs demonstrate significant biases, which may negatively impact patient outcomes if used to support clinical care decisions without appropriate precautions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Mayo Clinic Proceedings. Digital health Medicine and Dentistry (General), Health Informatics, Public Health and Health Policy

自引率

0.00%

发文量

审稿时长

47 days