{"title":"人工智能心理健康聊天机器人从基于规则的系统到大型语言模型的演变:系统回顾。","authors":"Yining Hua,Steve Siddals,Zilin Ma,Isaac Galatzer-Levy,Winna Xia,Christine Hau,Hongbin Na,Matthew Flathers,Jake Linardon,Cyrus Ayubcha,John Torous","doi":"10.1002/wps.21352","DOIUrl":null,"url":null,"abstract":"The rapid evolution of artificial intelligence (AI) chatbots in mental health care presents a fragmented landscape with variable clinical evidence and evaluation rigor. This systematic review of 160 studies (2020-2024) classifies chatbot architectures - rule-based, machine learning-based, and large language model (LLM)-based - and proposes a three-tier evaluation framework: foundational bench testing (technical validation), pilot feasibility testing (user engagement), and clinical efficacy testing (symptom reduction). While rule-based systems dominated until 2023, LLM-based chatbots surged to 45% of new studies in 2024. However, only 16% of LLM studies underwent clinical efficacy testing, with most (77%) still in early validation. Overall, only 47% of studies focused on clinical efficacy testing, exposing a critical gap in robust validation of therapeutic benefit. Discrepancies emerged between marketed claims (\"AI-powered\") and actual AI architectures, with many interventions relying on simple rule-based scripts. LLM-based chatbots are increasingly studied for emotional support and psychoeducation, yet they pose unique ethical concerns, including incorrect responses, privacy risks, and unverified therapeutic effects. Despite their generative capabilities, LLMs remain largely untested in high-stakes mental health contexts. This paper emphasizes the need for standardized evaluation and benchmarking aligned with medical AI certification to ensure safe, transparent and ethical deployment. The proposed framework enables clearer distinctions between technical novelty and clinical efficacy, offering clinicians, researchers and regulators ordered steps to guide future standards and benchmarks. To ensure that AI chatbots enhance mental health care, future research must prioritize rigorous clinical efficacy trials, transparent architecture reporting, and evaluations that reflect real-world impact rather than the well-known potential.","PeriodicalId":23858,"journal":{"name":"World Psychiatry","volume":"124 1","pages":"383-394"},"PeriodicalIF":65.8000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Charting the evolution of artificial intelligence mental health chatbots from rule-based systems to large language models: a systematic review.\",\"authors\":\"Yining Hua,Steve Siddals,Zilin Ma,Isaac Galatzer-Levy,Winna Xia,Christine Hau,Hongbin Na,Matthew Flathers,Jake Linardon,Cyrus Ayubcha,John Torous\",\"doi\":\"10.1002/wps.21352\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The rapid evolution of artificial intelligence (AI) chatbots in mental health care presents a fragmented landscape with variable clinical evidence and evaluation rigor. This systematic review of 160 studies (2020-2024) classifies chatbot architectures - rule-based, machine learning-based, and large language model (LLM)-based - and proposes a three-tier evaluation framework: foundational bench testing (technical validation), pilot feasibility testing (user engagement), and clinical efficacy testing (symptom reduction). While rule-based systems dominated until 2023, LLM-based chatbots surged to 45% of new studies in 2024. However, only 16% of LLM studies underwent clinical efficacy testing, with most (77%) still in early validation. Overall, only 47% of studies focused on clinical efficacy testing, exposing a critical gap in robust validation of therapeutic benefit. Discrepancies emerged between marketed claims (\\\"AI-powered\\\") and actual AI architectures, with many interventions relying on simple rule-based scripts. LLM-based chatbots are increasingly studied for emotional support and psychoeducation, yet they pose unique ethical concerns, including incorrect responses, privacy risks, and unverified therapeutic effects. Despite their generative capabilities, LLMs remain largely untested in high-stakes mental health contexts. This paper emphasizes the need for standardized evaluation and benchmarking aligned with medical AI certification to ensure safe, transparent and ethical deployment. The proposed framework enables clearer distinctions between technical novelty and clinical efficacy, offering clinicians, researchers and regulators ordered steps to guide future standards and benchmarks. To ensure that AI chatbots enhance mental health care, future research must prioritize rigorous clinical efficacy trials, transparent architecture reporting, and evaluations that reflect real-world impact rather than the well-known potential.\",\"PeriodicalId\":23858,\"journal\":{\"name\":\"World Psychiatry\",\"volume\":\"124 1\",\"pages\":\"383-394\"},\"PeriodicalIF\":65.8000,\"publicationDate\":\"2025-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"World Psychiatry\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/wps.21352\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Psychiatry","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/wps.21352","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Medicine","Score":null,"Total":0}
Charting the evolution of artificial intelligence mental health chatbots from rule-based systems to large language models: a systematic review.
The rapid evolution of artificial intelligence (AI) chatbots in mental health care presents a fragmented landscape with variable clinical evidence and evaluation rigor. This systematic review of 160 studies (2020-2024) classifies chatbot architectures - rule-based, machine learning-based, and large language model (LLM)-based - and proposes a three-tier evaluation framework: foundational bench testing (technical validation), pilot feasibility testing (user engagement), and clinical efficacy testing (symptom reduction). While rule-based systems dominated until 2023, LLM-based chatbots surged to 45% of new studies in 2024. However, only 16% of LLM studies underwent clinical efficacy testing, with most (77%) still in early validation. Overall, only 47% of studies focused on clinical efficacy testing, exposing a critical gap in robust validation of therapeutic benefit. Discrepancies emerged between marketed claims ("AI-powered") and actual AI architectures, with many interventions relying on simple rule-based scripts. LLM-based chatbots are increasingly studied for emotional support and psychoeducation, yet they pose unique ethical concerns, including incorrect responses, privacy risks, and unverified therapeutic effects. Despite their generative capabilities, LLMs remain largely untested in high-stakes mental health contexts. This paper emphasizes the need for standardized evaluation and benchmarking aligned with medical AI certification to ensure safe, transparent and ethical deployment. The proposed framework enables clearer distinctions between technical novelty and clinical efficacy, offering clinicians, researchers and regulators ordered steps to guide future standards and benchmarks. To ensure that AI chatbots enhance mental health care, future research must prioritize rigorous clinical efficacy trials, transparent architecture reporting, and evaluations that reflect real-world impact rather than the well-known potential.
期刊介绍:
World Psychiatry is the official journal of the World Psychiatric Association. It aims to disseminate information on significant clinical, service, and research developments in the mental health field.
World Psychiatry is published three times per year and is sent free of charge to psychiatrists.The recipient psychiatrists' names and addresses are provided by WPA member societies and sections.The language used in the journal is designed to be understandable by the majority of mental health professionals worldwide.