Stav Yosef, Moreah Zisquit, Ben Cohen, Anat Brunstein Klomek, Kfir Bar, Doron Friedman
{"title":"The impact of fine-tuning LLMs on the quality of automated therapy assessed by digital patients.","authors":"Stav Yosef, Moreah Zisquit, Ben Cohen, Anat Brunstein Klomek, Kfir Bar, Doron Friedman","doi":"10.1038/s44184-025-00159-1","DOIUrl":null,"url":null,"abstract":"<p><p>The use of generative large language models (LLMs) in mental health applications is gaining traction, with some proposals even suggesting LLM-based automated therapists. In this study, we assess the impact of fine-tuning therapist LLMs to improve the quality of therapy sessions, addressing a critical question in LLM-based mental health research. Specifically, we demonstrate that fine-tuning with datasets focused on specific therapeutic techniques significantly enhances the performance of LLM therapists. To facilitate this assessment, we introduce a novel evaluation system based on digital patients, powered by LLMs, which engage in text-based therapy sessions and provide session evaluations through questionnaires designed for human patients. This method addresses the inadequacies of traditional text-similarity metrics, which are insufficient for assessing the quality of therapeutic interactions. This study centers on motivational interviewing (MI), a structured and goal-oriented therapeutic approach. However, our digital therapists and patients can be adapted to work in other forms of therapy. We believe that our digital therapists offer a standardized method for assessing automated therapists and showcasing the potential of LLMs in mental health care.</p>","PeriodicalId":74321,"journal":{"name":"Npj mental health research","volume":"4 1","pages":"43"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12433451/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Npj mental health research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s44184-025-00159-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The use of generative large language models (LLMs) in mental health applications is gaining traction, with some proposals even suggesting LLM-based automated therapists. In this study, we assess the impact of fine-tuning therapist LLMs to improve the quality of therapy sessions, addressing a critical question in LLM-based mental health research. Specifically, we demonstrate that fine-tuning with datasets focused on specific therapeutic techniques significantly enhances the performance of LLM therapists. To facilitate this assessment, we introduce a novel evaluation system based on digital patients, powered by LLMs, which engage in text-based therapy sessions and provide session evaluations through questionnaires designed for human patients. This method addresses the inadequacies of traditional text-similarity metrics, which are insufficient for assessing the quality of therapeutic interactions. This study centers on motivational interviewing (MI), a structured and goal-oriented therapeutic approach. However, our digital therapists and patients can be adapted to work in other forms of therapy. We believe that our digital therapists offer a standardized method for assessing automated therapists and showcasing the potential of LLMs in mental health care.