比较生成人工智能和心理健康专业人员对创伤暴露人群的临床决策:基于小插曲的实验研究。

IF 5.8 2区 医学 Q1 PSYCHIATRY
Jmir Mental Health Pub Date : 2025-10-14 DOI:10.2196/80801
Katherine E Wislocki, Sabahat Sami, Gahl Liberzon, Alyson K Zalta
{"title":"比较生成人工智能和心理健康专业人员对创伤暴露人群的临床决策:基于小插曲的实验研究。","authors":"Katherine E Wislocki, Sabahat Sami, Gahl Liberzon, Alyson K Zalta","doi":"10.2196/80801","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Trauma exposure is highly prevalent and associated with various health issues. However, health care professionals can exhibit trauma-related diagnostic overshadowing bias, leading to misdiagnosis and inadequate treatment of trauma-exposed populations. Generative artificial intelligence (GAI) models are increasingly used in health care contexts. No research has examined whether GAI demonstrates this bias in decision-making and how rates of this bias may compare to mental health professionals (MHPs).</p><p><strong>Objective: </strong>This study aimed to assess trauma-related diagnostic overshadowing among frontier GAI models and compare evidence of trauma-related diagnostic overshadowing between frontier GAI models and MHPs.</p><p><strong>Methods: </strong>MHPs (N=232; mean [SD] age 43.7 [15.95] years) completed an experimental paradigm consisting of 2 vignettes describing adults presenting with obsessive-compulsive symptoms or substance abuse symptoms. One vignette included a trauma exposure history (ie, sexual trauma or physical trauma), and one vignette did not include a trauma exposure history. Participants answered questions about their preferences for diagnosis and treatment options for clients within the vignettes. GAI models (eg, Gemini 1.5 Flash, ChatGPT-4o mini, Claude Sonnet, and Meta Llama 3) completed the same experimental paradigm, with each block being reviewed by each GAI model 20 times. Mann-Whitney U tests and chi-square analyses were used to assess diagnostic and treatment decision-making across vignette factors and respondents.</p><p><strong>Results: </strong>GAI models, similar to MHPs, demonstrated some evidence of trauma-related diagnostic overshadowing bias, particularly in Likert-based ratings of posttraumatic stress disorder diagnosis and treatment when sexual trauma was present (P<.001). However, GAI models generally exhibited significantly less bias than MHPs across both Likert and forced-choice clinical decision tasks. Compared to MHPs, GAI models assigned higher ratings for the target diagnosis and treatment in obsessive-compulsive disorder vignettes (rb=0.43-0.63; P<.001) and for the target treatment in substance use disorder vignettes (rb=0.57; P<.001) when trauma was present. In forced-choice tasks, GAI models were significantly more accurate than MHPs in selecting the correct diagnosis and treatment for obsessive-compulsive disorder vignettes (χ²1=48.84-61.07; P<.001) and for substance use disorder vignettes involving sexual trauma (χ²1=15.17-101.61; P<.001).</p><p><strong>Conclusions: </strong>GAI models demonstrate some evidence of trauma-related diagnostic overshadowing bias, yet the degree of bias varied by task and model. Moreover, GAI models generally demonstrated less bias than MHPs in this experimental paradigm. These findings highlight the importance of understanding GAI biases in mental health care. More research into bias reduction strategies and responsible implementation of GAI models in mental health care is needed.</p>","PeriodicalId":48616,"journal":{"name":"Jmir Mental Health","volume":"12 ","pages":"e80801"},"PeriodicalIF":5.8000,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12527320/pdf/","citationCount":"0","resultStr":"{\"title\":\"Comparing Generative Artificial Intelligence and Mental Health Professionals for Clinical Decision-Making With Trauma-Exposed Populations: Vignette-Based Experimental Study.\",\"authors\":\"Katherine E Wislocki, Sabahat Sami, Gahl Liberzon, Alyson K Zalta\",\"doi\":\"10.2196/80801\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Trauma exposure is highly prevalent and associated with various health issues. However, health care professionals can exhibit trauma-related diagnostic overshadowing bias, leading to misdiagnosis and inadequate treatment of trauma-exposed populations. Generative artificial intelligence (GAI) models are increasingly used in health care contexts. No research has examined whether GAI demonstrates this bias in decision-making and how rates of this bias may compare to mental health professionals (MHPs).</p><p><strong>Objective: </strong>This study aimed to assess trauma-related diagnostic overshadowing among frontier GAI models and compare evidence of trauma-related diagnostic overshadowing between frontier GAI models and MHPs.</p><p><strong>Methods: </strong>MHPs (N=232; mean [SD] age 43.7 [15.95] years) completed an experimental paradigm consisting of 2 vignettes describing adults presenting with obsessive-compulsive symptoms or substance abuse symptoms. One vignette included a trauma exposure history (ie, sexual trauma or physical trauma), and one vignette did not include a trauma exposure history. Participants answered questions about their preferences for diagnosis and treatment options for clients within the vignettes. GAI models (eg, Gemini 1.5 Flash, ChatGPT-4o mini, Claude Sonnet, and Meta Llama 3) completed the same experimental paradigm, with each block being reviewed by each GAI model 20 times. Mann-Whitney U tests and chi-square analyses were used to assess diagnostic and treatment decision-making across vignette factors and respondents.</p><p><strong>Results: </strong>GAI models, similar to MHPs, demonstrated some evidence of trauma-related diagnostic overshadowing bias, particularly in Likert-based ratings of posttraumatic stress disorder diagnosis and treatment when sexual trauma was present (P<.001). However, GAI models generally exhibited significantly less bias than MHPs across both Likert and forced-choice clinical decision tasks. Compared to MHPs, GAI models assigned higher ratings for the target diagnosis and treatment in obsessive-compulsive disorder vignettes (rb=0.43-0.63; P<.001) and for the target treatment in substance use disorder vignettes (rb=0.57; P<.001) when trauma was present. In forced-choice tasks, GAI models were significantly more accurate than MHPs in selecting the correct diagnosis and treatment for obsessive-compulsive disorder vignettes (χ²1=48.84-61.07; P<.001) and for substance use disorder vignettes involving sexual trauma (χ²1=15.17-101.61; P<.001).</p><p><strong>Conclusions: </strong>GAI models demonstrate some evidence of trauma-related diagnostic overshadowing bias, yet the degree of bias varied by task and model. Moreover, GAI models generally demonstrated less bias than MHPs in this experimental paradigm. These findings highlight the importance of understanding GAI biases in mental health care. More research into bias reduction strategies and responsible implementation of GAI models in mental health care is needed.</p>\",\"PeriodicalId\":48616,\"journal\":{\"name\":\"Jmir Mental Health\",\"volume\":\"12 \",\"pages\":\"e80801\"},\"PeriodicalIF\":5.8000,\"publicationDate\":\"2025-10-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12527320/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Jmir Mental Health\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.2196/80801\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHIATRY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jmir Mental Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/80801","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}
引用次数: 0

摘要

背景:创伤暴露非常普遍,并与各种健康问题有关。然而,卫生保健专业人员可能会表现出与创伤相关的诊断掩盖偏见,导致对创伤暴露人群的误诊和治疗不足。生成式人工智能(GAI)模型越来越多地用于卫生保健环境。没有研究检验过GAI是否在决策中表现出这种偏见,以及这种偏见的比率与心理健康专业人员(MHPs)相比如何。目的:本研究旨在评估前沿GAI模型中创伤相关的诊断阴影,并比较前沿GAI模型和MHPs之间创伤相关诊断阴影的证据。方法:MHPs (N=232,平均[SD]年龄43.7[15.95]岁)完成了一个由2个小片段组成的实验范式,描述了出现强迫症症状或药物滥用症状的成年人。一个小插曲包括创伤暴露史(即性创伤或身体创伤),另一个小插曲不包括创伤暴露史。参与者在小短片中回答了关于他们对客户诊断和治疗方案的偏好的问题。GAI模型(如Gemini 1.5 Flash、chatgpt - 40 mini、Claude Sonnet和Meta Llama 3)完成了相同的实验范式,每个GAI模型对每个块进行了20次审查。曼-惠特尼检验和卡方分析用于评估小插曲因素和被调查者之间的诊断和治疗决策。结果:与MHPs类似,GAI模型显示出一些创伤相关的诊断阴影偏倚的证据,特别是在存在性创伤的创伤后应激障碍诊断和治疗的李克特评分中(结论:GAI模型显示出一些创伤相关的诊断阴影偏倚的证据,但偏倚的程度因任务和模型而异。此外,在该实验范式中,GAI模型通常比MHPs表现出更小的偏差。这些发现强调了理解GAI偏见在精神卫生保健中的重要性。需要对减少偏见策略和负责任的GAI模型在精神卫生保健中的实施进行更多的研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparing Generative Artificial Intelligence and Mental Health Professionals for Clinical Decision-Making With Trauma-Exposed Populations: Vignette-Based Experimental Study.

Background: Trauma exposure is highly prevalent and associated with various health issues. However, health care professionals can exhibit trauma-related diagnostic overshadowing bias, leading to misdiagnosis and inadequate treatment of trauma-exposed populations. Generative artificial intelligence (GAI) models are increasingly used in health care contexts. No research has examined whether GAI demonstrates this bias in decision-making and how rates of this bias may compare to mental health professionals (MHPs).

Objective: This study aimed to assess trauma-related diagnostic overshadowing among frontier GAI models and compare evidence of trauma-related diagnostic overshadowing between frontier GAI models and MHPs.

Methods: MHPs (N=232; mean [SD] age 43.7 [15.95] years) completed an experimental paradigm consisting of 2 vignettes describing adults presenting with obsessive-compulsive symptoms or substance abuse symptoms. One vignette included a trauma exposure history (ie, sexual trauma or physical trauma), and one vignette did not include a trauma exposure history. Participants answered questions about their preferences for diagnosis and treatment options for clients within the vignettes. GAI models (eg, Gemini 1.5 Flash, ChatGPT-4o mini, Claude Sonnet, and Meta Llama 3) completed the same experimental paradigm, with each block being reviewed by each GAI model 20 times. Mann-Whitney U tests and chi-square analyses were used to assess diagnostic and treatment decision-making across vignette factors and respondents.

Results: GAI models, similar to MHPs, demonstrated some evidence of trauma-related diagnostic overshadowing bias, particularly in Likert-based ratings of posttraumatic stress disorder diagnosis and treatment when sexual trauma was present (P<.001). However, GAI models generally exhibited significantly less bias than MHPs across both Likert and forced-choice clinical decision tasks. Compared to MHPs, GAI models assigned higher ratings for the target diagnosis and treatment in obsessive-compulsive disorder vignettes (rb=0.43-0.63; P<.001) and for the target treatment in substance use disorder vignettes (rb=0.57; P<.001) when trauma was present. In forced-choice tasks, GAI models were significantly more accurate than MHPs in selecting the correct diagnosis and treatment for obsessive-compulsive disorder vignettes (χ²1=48.84-61.07; P<.001) and for substance use disorder vignettes involving sexual trauma (χ²1=15.17-101.61; P<.001).

Conclusions: GAI models demonstrate some evidence of trauma-related diagnostic overshadowing bias, yet the degree of bias varied by task and model. Moreover, GAI models generally demonstrated less bias than MHPs in this experimental paradigm. These findings highlight the importance of understanding GAI biases in mental health care. More research into bias reduction strategies and responsible implementation of GAI models in mental health care is needed.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Jmir Mental Health
Jmir Mental Health Medicine-Psychiatry and Mental Health
CiteScore
10.80
自引率
3.80%
发文量
104
审稿时长
16 weeks
期刊介绍: JMIR Mental Health (JMH, ISSN 2368-7959) is a PubMed-indexed, peer-reviewed sister journal of JMIR, the leading eHealth journal (Impact Factor 2016: 5.175). JMIR Mental Health focusses on digital health and Internet interventions, technologies and electronic innovations (software and hardware) for mental health, addictions, online counselling and behaviour change. This includes formative evaluation and system descriptions, theoretical papers, review papers, viewpoint/vision papers, and rigorous evaluations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信