Investigating the interpretability of ChatGPT in mental health counseling: An analysis of artificial intelligence generated content differentiation

IF 4.9 2区医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computer methods and programs in biomedicine Pub Date : 2025-05-20 DOI:10.1016/j.cmpb.2025.108864

Yang Liu, Fan Wang

{"title":"Investigating the interpretability of ChatGPT in mental health counseling: An analysis of artificial intelligence generated content differentiation","authors":"Yang Liu, Fan Wang","doi":"10.1016/j.cmpb.2025.108864","DOIUrl":null,"url":null,"abstract":"<div><div>The global impact of COVID-19 has caused a significant rise in the demand for psychological counseling services, creating pressure on existing mental health professionals. Large language models (LLM), like ChatGPT, are considered a novel solution for delivering online psychological counseling. However, performance evaluation, emotional expression, high levels of anthropomorphism, ethical issues, transparency, and privacy breaches need to be addressed before LLM can be widely adopted.</div><div>This study aimed to evaluate ChatGPT’s effectiveness and emotional support capabilities in providing mental health counseling services from both macro and micro perspectives to examine whether it possesses psychological support abilities comparable to those of human experts. Building on the macro-level evaluation, we conducted a deeper comparison of the linguistic differences between ChatGPT and human experts at the micro-level. In addition, to respond to current policy requirements regarding the labeling, we further explored how to identify artificial intelligence generated content (AIGC) in counseling texts and which micro-level linguistic features can effectively distinguish AIGC from user-generated content (UGC). Finally, the study addressed transparency, privacy breaches, and ethical concerns.</div><div>We utilized ChatGPT for psychological interventions, applying LLM to address various mental health issues. The BERTopic algorithm evaluated the content across multiple mental health problems. Deep learning techniques were employed to differentiate between AIGC and UGC in psychological counseling responses. Furthermore, Local Interpretable Model-agnostic Explanation (LIME) and SHapley Additive exPlanations (SHAP) evaluate interpretability, providing deeper insights into the decision-making process and enhancing transparency.</div><div>At the macro level, ChatGPT demonstrated performance comparable to human experts, exhibiting professionalism, diversity, empathy, and a high degree of human likeness, making it highly effective in counseling services. At the micro level, deep learning models achieved accuracy rates of 99.12 % and 96.13 % in distinguishing content generated by ChatGPT 3.5 and ChatGPT 4.0 from UGC, respectively. Interpretability analysis revealed that context, sentence structure, and emotional expression were key factors differentiating AIGC from UGC.</div><div>The findings highlight ChatGPT's potential to deliver effective online psychological counseling and demonstrate a reliable framework for distinguishing between artificial intelligence-generated and human-generated content. This study underscores the importance of leveraging large-scale language models to support mental health services while addressing high-level anthropomorphic issues and ethical and practical challenges.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"268 ","pages":"Article 108864"},"PeriodicalIF":4.9000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725002810","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

The global impact of COVID-19 has caused a significant rise in the demand for psychological counseling services, creating pressure on existing mental health professionals. Large language models (LLM), like ChatGPT, are considered a novel solution for delivering online psychological counseling. However, performance evaluation, emotional expression, high levels of anthropomorphism, ethical issues, transparency, and privacy breaches need to be addressed before LLM can be widely adopted.

This study aimed to evaluate ChatGPT’s effectiveness and emotional support capabilities in providing mental health counseling services from both macro and micro perspectives to examine whether it possesses psychological support abilities comparable to those of human experts. Building on the macro-level evaluation, we conducted a deeper comparison of the linguistic differences between ChatGPT and human experts at the micro-level. In addition, to respond to current policy requirements regarding the labeling, we further explored how to identify artificial intelligence generated content (AIGC) in counseling texts and which micro-level linguistic features can effectively distinguish AIGC from user-generated content (UGC). Finally, the study addressed transparency, privacy breaches, and ethical concerns.

We utilized ChatGPT for psychological interventions, applying LLM to address various mental health issues. The BERTopic algorithm evaluated the content across multiple mental health problems. Deep learning techniques were employed to differentiate between AIGC and UGC in psychological counseling responses. Furthermore, Local Interpretable Model-agnostic Explanation (LIME) and SHapley Additive exPlanations (SHAP) evaluate interpretability, providing deeper insights into the decision-making process and enhancing transparency.

At the macro level, ChatGPT demonstrated performance comparable to human experts, exhibiting professionalism, diversity, empathy, and a high degree of human likeness, making it highly effective in counseling services. At the micro level, deep learning models achieved accuracy rates of 99.12 % and 96.13 % in distinguishing content generated by ChatGPT 3.5 and ChatGPT 4.0 from UGC, respectively. Interpretability analysis revealed that context, sentence structure, and emotional expression were key factors differentiating AIGC from UGC.

The findings highlight ChatGPT's potential to deliver effective online psychological counseling and demonstrate a reliable framework for distinguishing between artificial intelligence-generated and human-generated content. This study underscores the importance of leveraging large-scale language models to support mental health services while addressing high-level anthropomorphic issues and ethical and practical challenges.

查看原文本刊更多论文

探讨ChatGPT在心理健康咨询中的可解释性：人工智能生成内容差异化分析

COVID-19的全球影响导致对心理咨询服务的需求大幅增加，给现有的心理卫生专业人员带来了压力。像ChatGPT这样的大型语言模型（LLM）被认为是提供在线心理咨询的新颖解决方案。然而，在LLM被广泛采用之前，绩效评估、情感表达、高度拟人化、道德问题、透明度和隐私泄露需要得到解决。本研究旨在从宏观和微观两个角度评估ChatGPT提供心理健康咨询服务的有效性和情感支持能力，以检验其是否具有与人类专家相当的心理支持能力。在宏观层面评估的基础上，我们对ChatGPT和人类专家在微观层面的语言差异进行了更深入的比较。此外，为了响应当前政策对标注的要求，我们进一步探讨了如何识别辅导文本中的人工智能生成内容（AIGC），以及哪些微观层面的语言特征可以有效区分AIGC和用户生成内容（UGC）。最后，该研究解决了透明度、隐私泄露和道德问题。我们利用ChatGPT进行心理干预，运用LLM解决各种心理健康问题。BERTopic算法评估了多个心理健康问题的内容。采用深度学习技术来区分AIGC和UGC在心理咨询反应中的差异。此外，局部可解释模型不可知论解释（LIME）和SHapley加性解释（SHAP）评估了可解释性，为决策过程提供了更深入的见解，并提高了透明度。在宏观层面上，ChatGPT表现出与人类专家相当的性能，表现出专业性、多样性、同理心和高度的人类相似性，使其在咨询服务中非常有效。在微观层面，深度学习模型区分ChatGPT 3.5和ChatGPT 4.0生成的内容与UGC的准确率分别达到99.12%和96.13%。可解释性分析表明，语境、句子结构和情感表达是区分AIGC和UGC的关键因素。研究结果强调了ChatGPT在提供有效的在线心理咨询方面的潜力，并展示了区分人工智能生成和人类生成内容的可靠框架。这项研究强调了利用大规模语言模型来支持心理健康服务的重要性，同时解决高层次的拟人化问题以及伦理和实践挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer methods and programs in biomedicine 工程技术-工程：生物医学

CiteScore

12.30

自引率

6.60%

发文量

601

审稿时长

135 days

期刊介绍： To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.