Comparing the Perspectives of Generative AI, Mental Health Experts, and the General Public on Schizophrenia Recovery: Case Vignette Study

IF 4.8 2区医学 Q1 PSYCHIATRY

Jmir Mental Health Pub Date : 2024-03-18 DOI:10.2196/53043

Zohar Elyoseph, Inbar Levkovich

{"title":"Comparing the Perspectives of Generative AI, Mental Health Experts, and the General Public on Schizophrenia Recovery: Case Vignette Study","authors":"Zohar Elyoseph, Inbar Levkovich","doi":"10.2196/53043","DOIUrl":null,"url":null,"abstract":"Background: Background: The current paradigm in mental healthcare focuses on clinical recovery and symptom remission. This model’s efficacy is influenced by therapist trust in patient recovery potential and therapeutic relationship depth. Schizophrenia is a chronic illness with severe symptoms where the possibility of recovery is a matter of debate. As artificial intelligence (AI) becomes integrated into the healthcare field, it is important to examine its ability to assess recovery potential in major psychiatric disorders such as schizophrenia. Objective: Objectives: To evaluate the ability of Large Languets Models (LLMs) in comparison to mental health professionals to assess the prognosis of schizophrenia with and without treatments and the long term positive and negative outcomes. Methods: Methods: Vignettes were input to LLMs interfaces and assessed ten times by four AI platforms: ChatGPT-3.5, ChatGPT-4, Google Bard, and Claude. A total of 80 evaluations were collected and benchmarked against existing norms to analyze what mental health professionals (general practitioners, psychiatrists, clinical psychologists and mental health nurses) and the general public think about schizophrenia prognosis with and without treatment and the positive and negative long-term outcomes of schizophrenia interventions. Results: Results: Prognosis with professional help: ChatGPT-3.5 was notably pessimistic, whereas ChatGPT-4, Claude and BARD aligned with professional views but differed from the general public. All LLMs believed untreated schizophrenia would remain static or worsen without professional help. Long-term outcomes: ChatGPT-4 and Claude predicted more negative outcomes than BARD and ChatGPT-3.5. For positive outcomes, ChatGPT-3.5 and Claude were more negative than BARD and ChatGPT-4. Conclusions: Conclusions: The findings that three out of the four LLMs aligned closely with the predictions of mental health professionals when considering the 'with treatment' condition is a demonstration of the potential of this technology in providing professional clinical prognosis. The pessimistic assessment of ChatGPT 3.5 is a disturbing finding since it may reduce the motivation of patients to start or persist with treatment for schizophrenia. Overall, while LLMs hold promise in augmenting healthcare, their application necessitates rigorous validation and a harmonious blend with human expertise.","PeriodicalId":48616,"journal":{"name":"Jmir Mental Health","volume":"101 1","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jmir Mental Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/53043","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Background: The current paradigm in mental healthcare focuses on clinical recovery and symptom remission. This model’s efficacy is influenced by therapist trust in patient recovery potential and therapeutic relationship depth. Schizophrenia is a chronic illness with severe symptoms where the possibility of recovery is a matter of debate. As artificial intelligence (AI) becomes integrated into the healthcare field, it is important to examine its ability to assess recovery potential in major psychiatric disorders such as schizophrenia. Objective: Objectives: To evaluate the ability of Large Languets Models (LLMs) in comparison to mental health professionals to assess the prognosis of schizophrenia with and without treatments and the long term positive and negative outcomes. Methods: Methods: Vignettes were input to LLMs interfaces and assessed ten times by four AI platforms: ChatGPT-3.5, ChatGPT-4, Google Bard, and Claude. A total of 80 evaluations were collected and benchmarked against existing norms to analyze what mental health professionals (general practitioners, psychiatrists, clinical psychologists and mental health nurses) and the general public think about schizophrenia prognosis with and without treatment and the positive and negative long-term outcomes of schizophrenia interventions. Results: Results: Prognosis with professional help: ChatGPT-3.5 was notably pessimistic, whereas ChatGPT-4, Claude and BARD aligned with professional views but differed from the general public. All LLMs believed untreated schizophrenia would remain static or worsen without professional help. Long-term outcomes: ChatGPT-4 and Claude predicted more negative outcomes than BARD and ChatGPT-3.5. For positive outcomes, ChatGPT-3.5 and Claude were more negative than BARD and ChatGPT-4. Conclusions: Conclusions: The findings that three out of the four LLMs aligned closely with the predictions of mental health professionals when considering the 'with treatment' condition is a demonstration of the potential of this technology in providing professional clinical prognosis. The pessimistic assessment of ChatGPT 3.5 is a disturbing finding since it may reduce the motivation of patients to start or persist with treatment for schizophrenia. Overall, while LLMs hold promise in augmenting healthcare, their application necessitates rigorous validation and a harmonious blend with human expertise.

查看原文本刊更多论文

比较生成式人工智能、心理健康专家和普通大众对精神分裂症康复的看法：案例研究

背景介绍背景：背景：当前的心理保健模式侧重于临床康复和症状缓解。这种模式的有效性受到治疗师对患者康复潜力的信任和治疗关系深度的影响。精神分裂症是一种症状严重的慢性疾病，其康复的可能性尚存争议。随着人工智能（AI）逐渐融入医疗保健领域，研究其评估精神分裂症等主要精神疾病康复潜力的能力就显得尤为重要。目标目标评估大型语言模型（LLM）与精神卫生专业人员相比，在评估精神分裂症接受治疗和不接受治疗的预后以及长期积极和消极结果方面的能力。评估方法方法：方法：将小故事输入 LLMs 界面，由四个人工智能平台进行十次评估：ChatGPT-3.5、ChatGPT-4、Google Bard 和 Claude。共收集了 80 次评估，并以现有标准为基准，分析精神卫生专业人员（全科医生、精神科医生、临床心理学家和精神卫生护士）和普通大众对精神分裂症接受治疗和不接受治疗的预后以及精神分裂症干预措施的积极和消极长期结果的看法。结果结果：结果：在专业帮助下的预后：ChatGPT-3.5 明显悲观，而 ChatGPT-4、Claude 和 BARD 与专业人士的观点一致，但与普通大众不同。所有地方语言学家都认为，如果没有专业人士的帮助，未经治疗的精神分裂症将保持不变或恶化。长期结果：ChatGPT-4 和 Claude 预测的负面结果多于 BARD 和 ChatGPT-3.5。在积极结果方面，ChatGPT-3.5 和 Claude 比 BARD 和 ChatGPT-4 更消极。结论结论在考虑 "接受治疗 "的情况下，四项 LLM 中有三项与精神卫生专业人员的预测结果非常吻合，这一结果证明了该技术在提供专业临床预后方面的潜力。ChatGPT 3.5 的悲观评估是一个令人不安的发现，因为它可能会降低患者开始或坚持精神分裂症治疗的积极性。总之，尽管 LLM 在增强医疗保健方面大有可为，但其应用需要经过严格的验证，并与人类的专业知识和谐地结合在一起。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Jmir Mental Health Medicine-Psychiatry and Mental Health

CiteScore

10.80

自引率

3.80%

发文量

104

审稿时长

16 weeks

期刊介绍： JMIR Mental Health (JMH, ISSN 2368-7959) is a PubMed-indexed, peer-reviewed sister journal of JMIR, the leading eHealth journal (Impact Factor 2016: 5.175). JMIR Mental Health focusses on digital health and Internet interventions, technologies and electronic innovations (software and hardware) for mental health, addictions, online counselling and behaviour change. This includes formative evaluation and system descriptions, theoretical papers, review papers, viewpoint/vision papers, and rigorous evaluations.