A Comparison of Responses from Human Therapists and Large Language Model-Based Chatbots to Assess Therapeutic Communication: Mixed Methods Study.

IF 4.8 2区 医学 Q1 PSYCHIATRY
Jmir Mental Health Pub Date : 2025-05-21 DOI:10.2196/69709
Till Scholich, Maya Barr, Shannon Wiltsey Stirman, Shriti Raj
{"title":"A Comparison of Responses from Human Therapists and Large Language Model-Based Chatbots to Assess Therapeutic Communication: Mixed Methods Study.","authors":"Till Scholich, Maya Barr, Shannon Wiltsey Stirman, Shriti Raj","doi":"10.2196/69709","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Consumers are increasingly using large language model-based chatbots to seek mental health advice or intervention due to ease of access and limited availability of mental health professionals. However, their suitability and safety for mental health applications remain underexplored, particularly in comparison to professional therapeutic practices.</p><p><strong>Objective: </strong>This study aimed to evaluate how general-purpose chatbots respond to mental health scenarios and compare their responses to those provided by licensed therapists. Specifically, we sought to identify chatbots' strengths and limitations, as well as the ethical and practical considerations necessary for their use in mental health care.</p><p><strong>Methods: </strong>We conducted a mixed methods study to compare responses from chatbots and licensed therapists to scripted mental health scenarios. We created 2 fictional scenarios and prompted 3 chatbots to create 6 interaction logs. We recruited 17 therapists and conducted study sessions that consisted of 3 activities. First, therapists responded to the 2 scenarios using a Qualtrics form. Second, therapists went through the 6 interaction logs using a think-aloud procedure to highlight their thoughts about the chatbots' responses. Finally, we conducted a semistructured interview to explore subjective opinions on the use of chatbots for supporting mental health. The study sessions were analyzed using thematic analysis. The interaction logs from chatbot and therapist responses were coded using the Multitheoretical List of Therapeutic Interventions codes and then compared to each other.</p><p><strong>Results: </strong>We identified 7 themes describing the strengths and limitations of the chatbots as compared to therapists. These include elements of good therapy in chatbot responses, conversational style of chatbots, insufficient inquiry and feedback seeking by chatbots, chatbot interventions, client engagement, chatbots' responses to crisis situations, and considerations for chatbot-based therapy. In the use of Multitheoretical List of Therapeutic Interventions codes, we found that therapists evoked more elaboration (Mann-Whitney U=9; P=.001) and used more self-disclosure (U=45.5; P=.37) as compared to the chatbots. The chatbots used affirming (U=28; P=.045) and reassuring (U=23; P=.02) language more often than the therapists. The chatbots also used psychoeducation (U=22.5; P=.02) and suggestions (U=12.5; P=.003) more often than the therapists.</p><p><strong>Conclusions: </strong>Our study demonstrates the unsuitability of general-purpose chatbots to safely engage in mental health conversations, particularly in crisis situations. While chatbots display elements of good therapy, such as validation and reassurance, overuse of directive advice without sufficient inquiry and use of generic interventions make them unsuitable as therapeutic agents. Careful research and evaluation will be necessary to determine the impact of chatbot interactions and to identify the most appropriate use cases related to mental health.</p>","PeriodicalId":48616,"journal":{"name":"Jmir Mental Health","volume":"12 ","pages":"e69709"},"PeriodicalIF":4.8000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jmir Mental Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/69709","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Consumers are increasingly using large language model-based chatbots to seek mental health advice or intervention due to ease of access and limited availability of mental health professionals. However, their suitability and safety for mental health applications remain underexplored, particularly in comparison to professional therapeutic practices.

Objective: This study aimed to evaluate how general-purpose chatbots respond to mental health scenarios and compare their responses to those provided by licensed therapists. Specifically, we sought to identify chatbots' strengths and limitations, as well as the ethical and practical considerations necessary for their use in mental health care.

Methods: We conducted a mixed methods study to compare responses from chatbots and licensed therapists to scripted mental health scenarios. We created 2 fictional scenarios and prompted 3 chatbots to create 6 interaction logs. We recruited 17 therapists and conducted study sessions that consisted of 3 activities. First, therapists responded to the 2 scenarios using a Qualtrics form. Second, therapists went through the 6 interaction logs using a think-aloud procedure to highlight their thoughts about the chatbots' responses. Finally, we conducted a semistructured interview to explore subjective opinions on the use of chatbots for supporting mental health. The study sessions were analyzed using thematic analysis. The interaction logs from chatbot and therapist responses were coded using the Multitheoretical List of Therapeutic Interventions codes and then compared to each other.

Results: We identified 7 themes describing the strengths and limitations of the chatbots as compared to therapists. These include elements of good therapy in chatbot responses, conversational style of chatbots, insufficient inquiry and feedback seeking by chatbots, chatbot interventions, client engagement, chatbots' responses to crisis situations, and considerations for chatbot-based therapy. In the use of Multitheoretical List of Therapeutic Interventions codes, we found that therapists evoked more elaboration (Mann-Whitney U=9; P=.001) and used more self-disclosure (U=45.5; P=.37) as compared to the chatbots. The chatbots used affirming (U=28; P=.045) and reassuring (U=23; P=.02) language more often than the therapists. The chatbots also used psychoeducation (U=22.5; P=.02) and suggestions (U=12.5; P=.003) more often than the therapists.

Conclusions: Our study demonstrates the unsuitability of general-purpose chatbots to safely engage in mental health conversations, particularly in crisis situations. While chatbots display elements of good therapy, such as validation and reassurance, overuse of directive advice without sufficient inquiry and use of generic interventions make them unsuitable as therapeutic agents. Careful research and evaluation will be necessary to determine the impact of chatbot interactions and to identify the most appropriate use cases related to mental health.

比较人类治疗师和基于大型语言模型的聊天机器人的反应来评估治疗沟通:混合方法研究。
背景:消费者越来越多地使用基于大型语言模型的聊天机器人来寻求心理健康建议或干预,因为容易获得心理健康专业人员,而且可用性有限。然而,它们在心理健康应用中的适用性和安全性仍未得到充分探索,特别是与专业治疗实践相比。目的:本研究旨在评估通用聊天机器人对心理健康情景的反应,并将它们的反应与持牌治疗师提供的反应进行比较。具体来说,我们试图确定聊天机器人的优势和局限性,以及在精神卫生保健中使用它们所必需的道德和实际考虑。方法:我们进行了一项混合方法研究,比较聊天机器人和持牌治疗师对脚本心理健康场景的反应。我们创建了2个虚构的场景,并提示3个聊天机器人创建6个交互日志。我们招募了17名治疗师,并进行了包括3项活动的研究。首先,治疗师使用质量量表对这两种情况做出反应。其次,治疗师使用一种有声思考的程序来检查6个互动日志,以突出他们对聊天机器人反应的想法。最后,我们进行了半结构化访谈,以探讨使用聊天机器人支持心理健康的主观意见。研究阶段采用专题分析进行分析。聊天机器人和治疗师反应的交互日志使用治疗干预代码的多理论列表进行编码,然后相互比较。结果:我们确定了7个主题,描述了聊天机器人与治疗师相比的优势和局限性。其中包括聊天机器人反应中良好治疗的要素,聊天机器人的会话风格,聊天机器人不充分的询问和反馈寻求,聊天机器人干预,客户参与,聊天机器人对危机情况的反应,以及基于聊天机器人的治疗的考虑。在使用多理论治疗干预代码列表时,我们发现治疗师引起了更多的阐述(Mann-Whitney U=9;P=.001),使用更多的自我表露(U=45.5;P=.37)。聊天机器人使用肯定(U=28;P= 0.045)和令人安心(U=23;P=.02)。聊天机器人还使用了心理教育(U=22.5;P=.02)和建议(U=12.5;P=.003)高于治疗师。结论:我们的研究表明,通用聊天机器人不适合安全地进行心理健康对话,尤其是在危机情况下。虽然聊天机器人显示出良好治疗的要素,比如验证和安慰,但在没有充分询问的情况下过度使用指导性建议,以及使用通用干预措施,使它们不适合作为治疗药物。为了确定聊天机器人互动的影响,并确定与心理健康相关的最合适的用例,有必要进行仔细的研究和评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Jmir Mental Health
Jmir Mental Health Medicine-Psychiatry and Mental Health
CiteScore
10.80
自引率
3.80%
发文量
104
审稿时长
16 weeks
期刊介绍: JMIR Mental Health (JMH, ISSN 2368-7959) is a PubMed-indexed, peer-reviewed sister journal of JMIR, the leading eHealth journal (Impact Factor 2016: 5.175). JMIR Mental Health focusses on digital health and Internet interventions, technologies and electronic innovations (software and hardware) for mental health, addictions, online counselling and behaviour change. This includes formative evaluation and system descriptions, theoretical papers, review papers, viewpoint/vision papers, and rigorous evaluations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信