Racial, ethnic, and sex bias in large language model opioid recommendations for pain management.

IF 5.9 1区医学 Q1 ANESTHESIOLOGY

PAIN® Pub Date : 2024-09-06 DOI:10.1097/j.pain.0000000000003388

Cameron C Young, Elizabeth Enichen, Arya Rao, Marc D Succi

{"title":"Racial, ethnic, and sex bias in large language model opioid recommendations for pain management.","authors":"Cameron C Young, Elizabeth Enichen, Arya Rao, Marc D Succi","doi":"10.1097/j.pain.0000000000003388","DOIUrl":null,"url":null,"abstract":"<p><strong>Abstract: </strong>Understanding how large language model (LLM) recommendations vary with patient race/ethnicity provides insight into how LLMs may counter or compound bias in opioid prescription. Forty real-world patient cases were sourced from the MIMIC-IV Note dataset with chief complaints of abdominal pain, back pain, headache, or musculoskeletal pain and amended to include all combinations of race/ethnicity and sex. Large language models were instructed to provide a subjective pain rating and comprehensive pain management recommendation. Univariate analyses were performed to evaluate the association between racial/ethnic group or sex and the specified outcome measures-subjective pain rating, opioid name, order, and dosage recommendations-suggested by 2 LLMs (GPT-4 and Gemini). Four hundred eighty real-world patient cases were provided to each LLM, and responses included pharmacologic and nonpharmacologic interventions. Tramadol was the most recommended weak opioid in 55.4% of cases, while oxycodone was the most frequently recommended strong opioid in 33.2% of cases. Relative to GPT-4, Gemini was more likely to rate a patient's pain as \"severe\" (OR: 0.57 95% CI: [0.54, 0.60]; P < 0.001), recommend strong opioids (OR: 2.05 95% CI: [1.59, 2.66]; P < 0.001), and recommend opioids later (OR: 1.41 95% CI: [1.22, 1.62]; P < 0.001). Race/ethnicity and sex did not influence LLM recommendations. This study suggests that LLMs do not preferentially recommend opioid treatment for one group over another. Given that prior research shows race-based disparities in pain perception and treatment by healthcare providers, LLMs may offer physicians a helpful tool to guide their pain management and ensure equitable treatment across patient groups.</p>","PeriodicalId":19921,"journal":{"name":"PAIN®","volume":null,"pages":null},"PeriodicalIF":5.9000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PAIN®","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/j.pain.0000000000003388","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ANESTHESIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract: Understanding how large language model (LLM) recommendations vary with patient race/ethnicity provides insight into how LLMs may counter or compound bias in opioid prescription. Forty real-world patient cases were sourced from the MIMIC-IV Note dataset with chief complaints of abdominal pain, back pain, headache, or musculoskeletal pain and amended to include all combinations of race/ethnicity and sex. Large language models were instructed to provide a subjective pain rating and comprehensive pain management recommendation. Univariate analyses were performed to evaluate the association between racial/ethnic group or sex and the specified outcome measures-subjective pain rating, opioid name, order, and dosage recommendations-suggested by 2 LLMs (GPT-4 and Gemini). Four hundred eighty real-world patient cases were provided to each LLM, and responses included pharmacologic and nonpharmacologic interventions. Tramadol was the most recommended weak opioid in 55.4% of cases, while oxycodone was the most frequently recommended strong opioid in 33.2% of cases. Relative to GPT-4, Gemini was more likely to rate a patient's pain as "severe" (OR: 0.57 95% CI: [0.54, 0.60]; P < 0.001), recommend strong opioids (OR: 2.05 95% CI: [1.59, 2.66]; P < 0.001), and recommend opioids later (OR: 1.41 95% CI: [1.22, 1.62]; P < 0.001). Race/ethnicity and sex did not influence LLM recommendations. This study suggests that LLMs do not preferentially recommend opioid treatment for one group over another. Given that prior research shows race-based disparities in pain perception and treatment by healthcare providers, LLMs may offer physicians a helpful tool to guide their pain management and ensure equitable treatment across patient groups.

查看原文本刊更多论文

针对疼痛治疗的阿片类药物大样本推荐中的种族、民族和性别偏见。

摘要：了解大语言模型（LLM）的建议如何随患者种族/族裔的不同而变化，有助于深入了解大语言模型如何抵消或加剧阿片类药物处方的偏差。我们从 MIMIC-IV Note 数据集中获取了 40 例主诉为腹痛、背痛、头痛或肌肉骨骼疼痛的真实患者病例，并对其进行了修正，以纳入种族/民族和性别的所有组合。指导大语言模型提供主观疼痛评分和综合疼痛管理建议。进行了单变量分析，以评估种族/人种或性别与 2 个大语言模型（GPT-4 和 Gemini）建议的特定结果指标（主观疼痛评级、阿片类药物名称、订单和剂量建议）之间的关联。每个 LLM 收到了 4800 个真实世界的患者病例，答复包括药物和非药物干预措施。在 55.4% 的病例中，曲马多是最常被推荐的弱阿片类药物，而在 33.2% 的病例中，羟考酮是最常被推荐的强阿片类药物。与 GPT-4 相比，Gemini 更有可能将患者的疼痛评定为 "严重"（OR：0.57 95% CI：[0.54, 0.60]；P <0.001），更有可能推荐强阿片类药物（OR：2.05 95% CI：[1.59, 2.66]；P <0.001），也更有可能晚些时候推荐阿片类药物（OR：1.41 95% CI：[1.22, 1.62]；P <0.001）。种族/民族和性别并不影响 LLM 的建议。这项研究表明，LLM 不会优先推荐某一群体接受阿片类药物治疗。鉴于先前的研究表明，医疗服务提供者在疼痛认知和治疗方面存在基于种族的差异，LLM 可能会为医生提供一个有用的工具来指导他们的疼痛管理，并确保不同患者群体之间的公平治疗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PAIN® 医学-临床神经学

CiteScore

12.50

自引率

8.10%

发文量

242

审稿时长

9 months

期刊介绍： PAIN® is the official publication of the International Association for the Study of Pain and publishes original research on the nature,mechanisms and treatment of pain.PAIN® provides a forum for the dissemination of research in the basic and clinical sciences of multidisciplinary interest.