User Experience and Early Clinical Outcomes of a Mental Wellness Chatbot for Depression and Anxiety: Pilot Evaluation Mixed Methods Study.

IF 2 Q3 HEALTH CARE SCIENCES & SERVICES

JMIR Formative Research Pub Date : 2026-04-14 DOI:10.2196/90644

Scott Graupensperger, Emily J Ward, Graham Baum, Kate H Bentley, Emily R Dworkin, Millard Brown, Adam Chekroud, Matt Hawrilenko

{"title":"User Experience and Early Clinical Outcomes of a Mental Wellness Chatbot for Depression and Anxiety: Pilot Evaluation Mixed Methods Study.","authors":"Scott Graupensperger, Emily J Ward, Graham Baum, Kate H Bentley, Emily R Dworkin, Millard Brown, Adam Chekroud, Matt Hawrilenko","doi":"10.2196/90644","DOIUrl":null,"url":null,"abstract":"Background: Artificial intelligence-powered conversational agents (ie, chatbots) are increasingly popular outlets for users seeking psychological support, yet little is known about how users experience early-stage prototypes or which therapeutic processes contribute to clinical improvement. A transparent evaluation of emerging chatbot prototypes is needed to clarify if, how, and why artificial intelligence companions work and to guide their continued development.Objective: This mixed methods pilot study evaluated user experience, acceptability, and preliminary clinical signals for an early-stage mental wellness chatbot. We also examined whether baseline symptom severity moderated clinical improvement.Methods: Three sequential cohorts (n=125) completed a 2-week, incentivized chatbot exposure (approximately 60 min per week). Participants provided first-impression ratings, qualitative feedback, and pre-post assessments of depressive symptoms (PHQ-8 [Patient Health Questionnaire-8]), anxiety symptoms (GAD-7 [Generalized Anxiety Disorder-7]), psychological distress, well-being, and loneliness. Statistical models estimated symptom change and tested interactions with baseline symptom severity. Mixed methods analysis integrated quantitative outcomes with large language model-assisted qualitative content analysis of open-ended responses.Results: Participants described the chatbot as accessible, easy to use, and emotionally validating, while citing limitations in personalization and conversational depth. Qualitative responses consistently highlighted early therapeutic processes such as emotional validation, goal setting, and perceived attunement. Regression models showed significant pre-post reductions in depressive (Hedges g=-0.32) and anxiety (g=-0.32) symptoms, alongside modest improvements in distress and well-being. Baseline severity moderated improvement, with marginal effects indicating larger predicted reductions at higher PHQ-8 and GAD-7 baseline scores (eg, PHQ-8=15: g=-0.84; GAD-7=15: g=-0.62).Conclusions: This pilot provides a comprehensive view of early chatbot development and suggests promising user experiences and preliminary symptom improvements under structured pilot conditions. By integrating experiential and exploratory clinical data, the study identifies candidate process targets to inform ongoing refinement. Findings support continued development and demonstrate procedural feasibility for progression to larger, longer-term trials evaluating engagement and clinical outcomes under more naturalistic conditions.","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":"10 ","pages":"e90644"},"PeriodicalIF":2.0000,"publicationDate":"2026-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13094381/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Formative Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/90644","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Artificial intelligence-powered conversational agents (ie, chatbots) are increasingly popular outlets for users seeking psychological support, yet little is known about how users experience early-stage prototypes or which therapeutic processes contribute to clinical improvement. A transparent evaluation of emerging chatbot prototypes is needed to clarify if, how, and why artificial intelligence companions work and to guide their continued development.

Objective: This mixed methods pilot study evaluated user experience, acceptability, and preliminary clinical signals for an early-stage mental wellness chatbot. We also examined whether baseline symptom severity moderated clinical improvement.

Methods: Three sequential cohorts (n=125) completed a 2-week, incentivized chatbot exposure (approximately 60 min per week). Participants provided first-impression ratings, qualitative feedback, and pre-post assessments of depressive symptoms (PHQ-8 [Patient Health Questionnaire-8]), anxiety symptoms (GAD-7 [Generalized Anxiety Disorder-7]), psychological distress, well-being, and loneliness. Statistical models estimated symptom change and tested interactions with baseline symptom severity. Mixed methods analysis integrated quantitative outcomes with large language model-assisted qualitative content analysis of open-ended responses.

Results: Participants described the chatbot as accessible, easy to use, and emotionally validating, while citing limitations in personalization and conversational depth. Qualitative responses consistently highlighted early therapeutic processes such as emotional validation, goal setting, and perceived attunement. Regression models showed significant pre-post reductions in depressive (Hedges g=-0.32) and anxiety (g=-0.32) symptoms, alongside modest improvements in distress and well-being. Baseline severity moderated improvement, with marginal effects indicating larger predicted reductions at higher PHQ-8 and GAD-7 baseline scores (eg, PHQ-8=15: g=-0.84; GAD-7=15: g=-0.62).

Conclusions: This pilot provides a comprehensive view of early chatbot development and suggests promising user experiences and preliminary symptom improvements under structured pilot conditions. By integrating experiential and exploratory clinical data, the study identifies candidate process targets to inform ongoing refinement. Findings support continued development and demonstrate procedural feasibility for progression to larger, longer-term trials evaluating engagement and clinical outcomes under more naturalistic conditions.

查看原文本刊更多论文

抑郁症和焦虑症心理健康聊天机器人的用户体验和早期临床结果：试点评估混合方法研究。

背景：人工智能驱动的会话代理（即聊天机器人）越来越受用户寻求心理支持的欢迎，但人们对用户如何体验早期原型或哪些治疗过程有助于临床改善知之甚少。需要对新兴的聊天机器人原型进行透明的评估，以澄清人工智能伴侣是否、如何以及为什么工作，并指导它们的持续发展。目的：这项混合方法的试点研究评估了早期心理健康聊天机器人的用户体验、可接受性和初步临床信号。我们还研究了基线症状严重程度是否会减缓临床改善。方法：三个连续队列（n=125）完成了为期2周的激励聊天机器人暴露（每周约60分钟）。参与者提供第一印象评分、定性反馈和抑郁症状（PHQ-8[患者健康问卷-8]）、焦虑症状（GAD-7[广泛性焦虑障碍-7]）、心理困扰、幸福感和孤独感的前后评估。统计模型估计症状变化并测试与基线症状严重程度的相互作用。混合方法分析将定量结果与大型语言模型辅助的开放式回答的定性内容分析相结合。结果：参与者将聊天机器人描述为可访问，易于使用，情感验证，同时指出个性化和对话深度的局限性。定性反应始终强调早期治疗过程，如情感验证、目标设定和感知调谐。回归模型显示，抑郁（Hedges g=-0.32）和焦虑（g=-0.32）症状在前后显著减少，同时痛苦和幸福感略有改善。基线严重程度减缓了改善，边际效应表明PHQ-8和GAD-7基线评分越高，预测的改善幅度越大（例如，PHQ-8=15: g=-0.84; GAD-7=15: g=-0.62）。结论：该试点提供了早期聊天机器人开发的全面视图，并建议在结构化试点条件下有希望的用户体验和初步症状改善。通过整合经验和探索性临床数据，该研究确定了候选过程目标，以告知正在进行的改进。研究结果支持继续发展，并证明了在更自然的条件下进行更大规模、更长期的试验评估参与和临床结果的程序可行性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊