{"title":"Investigating the interpretability of ChatGPT in mental health counseling: An analysis of artificial intelligence generated content differentiation","authors":"Yang Liu, Fan Wang","doi":"10.1016/j.cmpb.2025.108864","DOIUrl":null,"url":null,"abstract":"<div><div>The global impact of COVID-19 has caused a significant rise in the demand for psychological counseling services, creating pressure on existing mental health professionals. Large language models (LLM), like ChatGPT, are considered a novel solution for delivering online psychological counseling. However, performance evaluation, emotional expression, high levels of anthropomorphism, ethical issues, transparency, and privacy breaches need to be addressed before LLM can be widely adopted.</div><div>This study aimed to evaluate ChatGPT’s effectiveness and emotional support capabilities in providing mental health counseling services from both macro and micro perspectives to examine whether it possesses psychological support abilities comparable to those of human experts. Building on the macro-level evaluation, we conducted a deeper comparison of the linguistic differences between ChatGPT and human experts at the micro-level. In addition, to respond to current policy requirements regarding the labeling, we further explored how to identify artificial intelligence generated content (AIGC) in counseling texts and which micro-level linguistic features can effectively distinguish AIGC from user-generated content (UGC). Finally, the study addressed transparency, privacy breaches, and ethical concerns.</div><div>We utilized ChatGPT for psychological interventions, applying LLM to address various mental health issues. The BERTopic algorithm evaluated the content across multiple mental health problems. Deep learning techniques were employed to differentiate between AIGC and UGC in psychological counseling responses. Furthermore, Local Interpretable Model-agnostic Explanation (LIME) and SHapley Additive exPlanations (SHAP) evaluate interpretability, providing deeper insights into the decision-making process and enhancing transparency.</div><div>At the macro level, ChatGPT demonstrated performance comparable to human experts, exhibiting professionalism, diversity, empathy, and a high degree of human likeness, making it highly effective in counseling services. At the micro level, deep learning models achieved accuracy rates of 99.12 % and 96.13 % in distinguishing content generated by ChatGPT 3.5 and ChatGPT 4.0 from UGC, respectively. Interpretability analysis revealed that context, sentence structure, and emotional expression were key factors differentiating AIGC from UGC.</div><div>The findings highlight ChatGPT's potential to deliver effective online psychological counseling and demonstrate a reliable framework for distinguishing between artificial intelligence-generated and human-generated content. This study underscores the importance of leveraging large-scale language models to support mental health services while addressing high-level anthropomorphic issues and ethical and practical challenges.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"268 ","pages":"Article 108864"},"PeriodicalIF":4.9000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725002810","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
The global impact of COVID-19 has caused a significant rise in the demand for psychological counseling services, creating pressure on existing mental health professionals. Large language models (LLM), like ChatGPT, are considered a novel solution for delivering online psychological counseling. However, performance evaluation, emotional expression, high levels of anthropomorphism, ethical issues, transparency, and privacy breaches need to be addressed before LLM can be widely adopted.
This study aimed to evaluate ChatGPT’s effectiveness and emotional support capabilities in providing mental health counseling services from both macro and micro perspectives to examine whether it possesses psychological support abilities comparable to those of human experts. Building on the macro-level evaluation, we conducted a deeper comparison of the linguistic differences between ChatGPT and human experts at the micro-level. In addition, to respond to current policy requirements regarding the labeling, we further explored how to identify artificial intelligence generated content (AIGC) in counseling texts and which micro-level linguistic features can effectively distinguish AIGC from user-generated content (UGC). Finally, the study addressed transparency, privacy breaches, and ethical concerns.
We utilized ChatGPT for psychological interventions, applying LLM to address various mental health issues. The BERTopic algorithm evaluated the content across multiple mental health problems. Deep learning techniques were employed to differentiate between AIGC and UGC in psychological counseling responses. Furthermore, Local Interpretable Model-agnostic Explanation (LIME) and SHapley Additive exPlanations (SHAP) evaluate interpretability, providing deeper insights into the decision-making process and enhancing transparency.
At the macro level, ChatGPT demonstrated performance comparable to human experts, exhibiting professionalism, diversity, empathy, and a high degree of human likeness, making it highly effective in counseling services. At the micro level, deep learning models achieved accuracy rates of 99.12 % and 96.13 % in distinguishing content generated by ChatGPT 3.5 and ChatGPT 4.0 from UGC, respectively. Interpretability analysis revealed that context, sentence structure, and emotional expression were key factors differentiating AIGC from UGC.
The findings highlight ChatGPT's potential to deliver effective online psychological counseling and demonstrate a reliable framework for distinguishing between artificial intelligence-generated and human-generated content. This study underscores the importance of leveraging large-scale language models to support mental health services while addressing high-level anthropomorphic issues and ethical and practical challenges.
期刊介绍:
To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine.
Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.