Feasibility of Using ChatGPT to Generate Exposure Hierarchies for Treating Obsessive-Compulsive Disorder

IF 3.8 2区心理学 Q2 PSYCHIATRY

Behavior Therapy Pub Date : 2025-03-10 DOI:10.1016/j.beth.2025.02.005

Emily E. Bernstein, Adam C. Jaroszewski, Ryan J. Jacoby, Natasha H. Bailen, Jennifer Ragan, Aisha Usmani, Sabine Wilhelm

{"title":"Feasibility of Using ChatGPT to Generate Exposure Hierarchies for Treating Obsessive-Compulsive Disorder","authors":"Emily E. Bernstein, Adam C. Jaroszewski, Ryan J. Jacoby, Natasha H. Bailen, Jennifer Ragan, Aisha Usmani, Sabine Wilhelm","doi":"10.1016/j.beth.2025.02.005","DOIUrl":null,"url":null,"abstract":"<div><div>Obsessive-compulsive disorder (OCD) is a chronic, severe condition. Although exposure and response prevention (ERP), the first-line treatment for OCD, is highly effective, too few clinicians are equipped to deliver it. One barrier is the time and expertise required to develop personalized exposure hierarchies. In this study, we examined the feasibility and promise of using large language models (LLMs) to generate appropriate exposure suggestions for OCD treatment. We used ChatGPT-4 (Generative Pretrained Transformer, Version 4) to generate 10-item exposure hierarchies for simulated patient cases that were systematically varied along the following dimensions: OCD subtype, symptom complexity or number, level of detail, patient age, and patient gender. Expert clinicians also generated hierarchies for a subset of prompts. ChatGPT-generated hierarchies were first rated for completeness and degree to which input information was incorporated. Three OCD experts blinded to the aims of the study then rated each ChatGPT- and expert-generated hierarchy’s appropriateness, specificity, variability, safety/ethics, and overall usefulness or quality. ChatGPT generated partial (n = 15) or complete (n = 55) responses to 70 of 72 prompts and incorporated most input information (M = 4.29 out of 5, SD = 0.85). The only significant predictor of degree of input information incorporated was number of OCD symptoms; prompts with the most symptoms were rated as incorporating less input information than prompts with both low and moderate number of symptoms, ps < .05. Overall, ChatGPT-generated hierarchies were viewed as appropriate (M = 4.47, SD = 0.58), specific (M = 4.17, SD = 0.65), variable (M = 3.96, SD = 0.79), safe/ethical (M = 4.89, SD = 0.24), and useful (M = 3.99, SD = 0.82). However, expert human-generated hierarchies were still rated as significantly more appropriate, specific, variable, and useful, ps < .05, but not more or less safe and ethical than ChatGPT-generated hierarchies, p = .24. Only the level of symptom detail included in prompts was associated with ratings of ChatGPT-generated hierarchies, ps < .05, such that hierarchies were rated significantly better when prompts had been more detailed. Results suggest that LLMs such as ChatGPT hold great promise in helping generate effective OCD exposure hierarchies, while also highlighting key limitations that require resolution prior to clinical implementation. Given that few clinicians specialize in OCD treatment, it would be advantageous to establish how face-to-face or digital treatment can be augmented with this technology.</div></div>","PeriodicalId":48359,"journal":{"name":"Behavior Therapy","volume":"56 4","pages":"Pages 680-688"},"PeriodicalIF":3.8000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Behavior Therapy","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0005789425000231","RegionNum":2,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PSYCHIATRY","Score":null,"Total":0}

引用次数: 0

Abstract

Obsessive-compulsive disorder (OCD) is a chronic, severe condition. Although exposure and response prevention (ERP), the first-line treatment for OCD, is highly effective, too few clinicians are equipped to deliver it. One barrier is the time and expertise required to develop personalized exposure hierarchies. In this study, we examined the feasibility and promise of using large language models (LLMs) to generate appropriate exposure suggestions for OCD treatment. We used ChatGPT-4 (Generative Pretrained Transformer, Version 4) to generate 10-item exposure hierarchies for simulated patient cases that were systematically varied along the following dimensions: OCD subtype, symptom complexity or number, level of detail, patient age, and patient gender. Expert clinicians also generated hierarchies for a subset of prompts. ChatGPT-generated hierarchies were first rated for completeness and degree to which input information was incorporated. Three OCD experts blinded to the aims of the study then rated each ChatGPT- and expert-generated hierarchy’s appropriateness, specificity, variability, safety/ethics, and overall usefulness or quality. ChatGPT generated partial (n = 15) or complete (n = 55) responses to 70 of 72 prompts and incorporated most input information (M = 4.29 out of 5, SD = 0.85). The only significant predictor of degree of input information incorporated was number of OCD symptoms; prompts with the most symptoms were rated as incorporating less input information than prompts with both low and moderate number of symptoms, ps < .05. Overall, ChatGPT-generated hierarchies were viewed as appropriate (M = 4.47, SD = 0.58), specific (M = 4.17, SD = 0.65), variable (M = 3.96, SD = 0.79), safe/ethical (M = 4.89, SD = 0.24), and useful (M = 3.99, SD = 0.82). However, expert human-generated hierarchies were still rated as significantly more appropriate, specific, variable, and useful, ps < .05, but not more or less safe and ethical than ChatGPT-generated hierarchies, p = .24. Only the level of symptom detail included in prompts was associated with ratings of ChatGPT-generated hierarchies, ps < .05, such that hierarchies were rated significantly better when prompts had been more detailed. Results suggest that LLMs such as ChatGPT hold great promise in helping generate effective OCD exposure hierarchies, while also highlighting key limitations that require resolution prior to clinical implementation. Given that few clinicians specialize in OCD treatment, it would be advantageous to establish how face-to-face or digital treatment can be augmented with this technology.

查看原文本刊更多论文

使用ChatGPT生成暴露层次治疗强迫症的可行性

强迫症（OCD）是一种严重的慢性疾病。虽然暴露和反应预防（ERP）是强迫症的一线治疗方法，非常有效，但很少有临床医生有能力提供这种治疗。一个障碍是开发个性化的暴露层次需要时间和专业知识。在本研究中，我们考察了使用大型语言模型（llm）为强迫症治疗生成适当暴露建议的可行性和前景。我们使用ChatGPT-4（生成式预训练转换器，版本4）为模拟的患者病例生成10项暴露层次，这些病例系统地沿着以下维度变化：强迫症亚型、症状复杂性或数量、详细程度、患者年龄和患者性别。专家临床医生还生成了提示子集的层次结构。chatgpt生成的层次结构首先根据输入信息的完整性和程度进行评级。三名强迫症专家对研究的目的一无所知，然后对每个ChatGPT和专家生成的层次结构的适当性、特异性、可变性、安全性/伦理性以及总体有用性或质量进行评级。ChatGPT对72个提示中的70个生成部分（n = 15）或完整（n = 55）响应，并合并了大多数输入信息（M = 4.29 out of 5, SD = 0.85）。输入信息整合程度的唯一显著预测因子是强迫症症状的数量；具有最多症状的提示被评为比具有少量和中等数量症状的提示包含更少的输入信息，ps <； .05。总的来说,ChatGPT-generated层次结构被认为是适当的(M = 4.47,SD = 0.58),具体(M = 4.17,SD = 0.65),变量(M = 3.96,SD = 0.79),安全/伦理(M = 4.89,SD = 0.24),和有用的(M = 3.99,SD = 0.82)。然而，专家人工生成的层次结构仍然被评为更合适、更具体、更可变和更有用，ps <； 。p = .24，但并不比chatgpt生成的层次结构更安全和道德。只有提示中包含的症状详细程度与chatgpt生成的层次结构的评级相关联，ps <； 。05，因此，当提示更详细时，等级制度的评分明显更好。结果表明，像ChatGPT这样的llm在帮助生成有效的强迫症暴露层次结构方面具有很大的前景，同时也强调了在临床实施之前需要解决的关键限制。考虑到很少有临床医生专门从事强迫症治疗，建立面对面或数字治疗如何与这项技术相结合将是有利的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Behavior Therapy Multiple-

CiteScore

7.40

自引率

2.70%

发文量

113

审稿时长

121 days

期刊介绍： Behavior Therapy is a quarterly international journal devoted to the application of the behavioral and cognitive sciences to the conceptualization, assessment, and treatment of psychopathology and related clinical problems. It is intended for mental health professionals and students from all related disciplines who wish to remain current in these areas and provides a vehicle for scientist-practitioners and clinical scientists to report the results of their original empirical research. Although the major emphasis is placed upon empirical research, methodological and theoretical papers as well as evaluative reviews of the literature will also be published. Controlled single-case designs and clinical replication series are welcome.