Julia Ohse , Bakir Hadžić , Parvez Mohammed , Nicolina Peperkorn , Michael Danner , Akihiro Yorita , Naoyuki Kubota , Matthias Rätsch , Youssef Shiban
{"title":"Zero-Shot Strike: Testing the generalisation capabilities of out-of-the-box LLM models for depression detection","authors":"Julia Ohse , Bakir Hadžić , Parvez Mohammed , Nicolina Peperkorn , Michael Danner , Akihiro Yorita , Naoyuki Kubota , Matthias Rätsch , Youssef Shiban","doi":"10.1016/j.csl.2024.101663","DOIUrl":null,"url":null,"abstract":"<div><p>Depression is a significant global health challenge. Still, many people suffering from depression remain undiagnosed. Furthermore, the assessment of depression can be subject to human bias. Natural Language Processing (NLP) models offer a promising solution. We investigated the potential of four NLP models (BERT, Llama2-13B, GPT-3.5, and GPT-4) for depression detection in clinical interviews. Participants (N = 82) underwent clinical interviews and completed a self-report depression questionnaire. NLP models inferred depression scores from interview transcripts. Questionnaire cut-off values for depression were used as a classifier for depression. GPT-4 showed the highest accuracy for depression classification (F1 score 0.73), while zero-shot GPT-3.5 initially performed with low accuracy (0.34), improved to 0.82 after fine-tuning, and achieved 0.68 with clustered data. GPT-4 estimates of symptom severity PHQ-8 score correlated strongly (r = 0.71) with true symptom severity. These findings demonstrate the potential of AI models for depression detection. However, further research is necessary before widespread deployment can be considered.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"88 ","pages":"Article 101663"},"PeriodicalIF":3.1000,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230824000469","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Depression is a significant global health challenge. Still, many people suffering from depression remain undiagnosed. Furthermore, the assessment of depression can be subject to human bias. Natural Language Processing (NLP) models offer a promising solution. We investigated the potential of four NLP models (BERT, Llama2-13B, GPT-3.5, and GPT-4) for depression detection in clinical interviews. Participants (N = 82) underwent clinical interviews and completed a self-report depression questionnaire. NLP models inferred depression scores from interview transcripts. Questionnaire cut-off values for depression were used as a classifier for depression. GPT-4 showed the highest accuracy for depression classification (F1 score 0.73), while zero-shot GPT-3.5 initially performed with low accuracy (0.34), improved to 0.82 after fine-tuning, and achieved 0.68 with clustered data. GPT-4 estimates of symptom severity PHQ-8 score correlated strongly (r = 0.71) with true symptom severity. These findings demonstrate the potential of AI models for depression detection. However, further research is necessary before widespread deployment can be considered.
期刊介绍:
Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language.
The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.