Edoardo Sebastiano De Duro, Riccardo Improta, Massimo Stella
{"title":"Introducing CounseLLMe: A dataset of simulated mental health dialogues for comparing LLMs like Haiku, LLaMAntino and ChatGPT against humans","authors":"Edoardo Sebastiano De Duro, Riccardo Improta, Massimo Stella","doi":"10.1016/j.etdah.2025.100170","DOIUrl":null,"url":null,"abstract":"<div><div>We introduce CounseLLMe as a multilingual, multi-model dataset of 400 simulated mental health counselling dialogues between two state-of-the-art Large Language Models (LLMs). These conversations - of 20 quips each - were generated either in English (using OpenAI’s GPT 3.5 and Claude-3’s Haiku) or Italian (with Claude-3’s Haiku and LLaMAntino) and with prompts tuned with the help of a professional in psychotherapy. We investigate the resulting conversations through comparison against human mental health conversations on the same topic of depression. To compare linguistic features, knowledge structure and emotional content between LLMs and humans, we employed textual forma mentis networks, i.e. cognitive networks where nodes represent concepts and links indicate syntactic or semantic relationships between concepts in the dialogues’ quips. We find that the emotional structure of LLM-LLM English conversations matches the one of humans in terms of patient-therapist trust exchanges, i.e. 1 in 5 LLM-LLM quips contain trust along 10 conversational turns versus the <span><math><mrow><mn>24</mn><mo>%</mo></mrow></math></span> rate found in humans. ChatGPT and Haiku’s simulated English patients can also reproduce human feelings of conflict and pessimism. However, human patients display non-negligible levels of anger/frustration that is missing in LLMs. Italian LLMs’ conversations are worse in reproducing human patterns. All LLM-LLM conversations reproduced human syntactic patterns of increased absolutist pronoun usage in patients and second-person, trust-inducing, pronoun usage in therapists. Our results indicate that LLMs can realistically reproduce several aspects of human patient-therapist conversations and we thusly release CounseLLMe as a public dataset for novel data-informed opportunities in mental health and machine psychology.</div></div>","PeriodicalId":72899,"journal":{"name":"Emerging trends in drugs, addictions, and health","volume":"5 ","pages":"Article 100170"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Emerging trends in drugs, addictions, and health","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667118225000017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We introduce CounseLLMe as a multilingual, multi-model dataset of 400 simulated mental health counselling dialogues between two state-of-the-art Large Language Models (LLMs). These conversations - of 20 quips each - were generated either in English (using OpenAI’s GPT 3.5 and Claude-3’s Haiku) or Italian (with Claude-3’s Haiku and LLaMAntino) and with prompts tuned with the help of a professional in psychotherapy. We investigate the resulting conversations through comparison against human mental health conversations on the same topic of depression. To compare linguistic features, knowledge structure and emotional content between LLMs and humans, we employed textual forma mentis networks, i.e. cognitive networks where nodes represent concepts and links indicate syntactic or semantic relationships between concepts in the dialogues’ quips. We find that the emotional structure of LLM-LLM English conversations matches the one of humans in terms of patient-therapist trust exchanges, i.e. 1 in 5 LLM-LLM quips contain trust along 10 conversational turns versus the rate found in humans. ChatGPT and Haiku’s simulated English patients can also reproduce human feelings of conflict and pessimism. However, human patients display non-negligible levels of anger/frustration that is missing in LLMs. Italian LLMs’ conversations are worse in reproducing human patterns. All LLM-LLM conversations reproduced human syntactic patterns of increased absolutist pronoun usage in patients and second-person, trust-inducing, pronoun usage in therapists. Our results indicate that LLMs can realistically reproduce several aspects of human patient-therapist conversations and we thusly release CounseLLMe as a public dataset for novel data-informed opportunities in mental health and machine psychology.