Acoustic signatures of depression elicited by emotion-based and theme-based speech tasks.

IF 4.9 0 PSYCHIATRY

BMJ mental health Pub Date : 2025-09-29 DOI:10.1136/bmjment-2025-301858

Qunxing Lin,Xiaohua Wu,Yueshiyuan Lei,Wanying Cheng,Shan Huang,Weijie Wang,Chong Li,Jiubo Zhao

{"title":"Acoustic signatures of depression elicited by emotion-based and theme-based speech tasks.","authors":"Qunxing Lin,Xiaohua Wu,Yueshiyuan Lei,Wanying Cheng,Shan Huang,Weijie Wang,Chong Li,Jiubo Zhao","doi":"10.1136/bmjment-2025-301858","DOIUrl":null,"url":null,"abstract":"BACKGROUND\r\nMajor depressive disorder (MDD) remains underdiagnosed worldwide, partly due to reliance on self-reported symptoms and clinician-administered interviews.\r\n\r\nOBJECTIVE\r\nThis study examined whether a speech-based classification model using emotionally and thematically varied image-description tasks could effectively distinguish individuals with MDD from healthy controls.\r\n\r\nMETHODS\r\nA total of 120 participants (59 with MDD, 61 healthy controls) completed four speech tasks: three emotionally valenced images (positive, neutral, negative) and one Thematic Apperception Test (TAT) stimulus. Speech responses were segmented, and 23 acoustic features were extracted per sample. Classification was performed using a long short-term memory (LSTM) neural network, with SHapley Additive exPlanations (SHAP) applied for feature interpretation. Four traditional machine learning models (support vector machine, decision tree, k-nearest neighbour, random forest) served as comparators. Within-subject variation in speech duration was assessed with repeated-measures Analysis of Variance.\r\n\r\nFINDINGS\r\nThe LSTM model outperformed traditional classifiers, capturing temporal and dynamic speech patterns. The positive-valence image task achieved the highest accuracy (87.5%), followed by the negative-valence (85.0%), TAT (84.2%) and neutral-valence (81.7%) tasks. SHAP analysis highlighted task-specific contributions of pitch-related and spectral features. Significant differences in speech duration across tasks (p<0.01) indicated that affective valence influenced speech production.\r\n\r\nCONCLUSIONS\r\nEmotionally enriched and thematically ambiguous tasks enhanced automated MDD detection, with positive-valence stimuli providing the greatest discriminative power. SHAP interpretation underscored the importance of tailoring models to different speech inputs.\r\n\r\nCLINICAL IMPLICATIONS\r\nSpeech-based models incorporating emotionally evocative and projective stimuli offer a scalable, non-invasive approach for early depression screening. Their reliance on natural speech supports cross-cultural application and reduces stigma and literacy barriers. Broader validation is needed to facilitate integration into routine screening and monitoring.","PeriodicalId":72434,"journal":{"name":"BMJ mental health","volume":"19 1","pages":""},"PeriodicalIF":4.9000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ mental health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjment-2025-301858","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"PSYCHIATRY","Score":null,"Total":0}

引用次数: 0

Abstract

BACKGROUND Major depressive disorder (MDD) remains underdiagnosed worldwide, partly due to reliance on self-reported symptoms and clinician-administered interviews. OBJECTIVE This study examined whether a speech-based classification model using emotionally and thematically varied image-description tasks could effectively distinguish individuals with MDD from healthy controls. METHODS A total of 120 participants (59 with MDD, 61 healthy controls) completed four speech tasks: three emotionally valenced images (positive, neutral, negative) and one Thematic Apperception Test (TAT) stimulus. Speech responses were segmented, and 23 acoustic features were extracted per sample. Classification was performed using a long short-term memory (LSTM) neural network, with SHapley Additive exPlanations (SHAP) applied for feature interpretation. Four traditional machine learning models (support vector machine, decision tree, k-nearest neighbour, random forest) served as comparators. Within-subject variation in speech duration was assessed with repeated-measures Analysis of Variance. FINDINGS The LSTM model outperformed traditional classifiers, capturing temporal and dynamic speech patterns. The positive-valence image task achieved the highest accuracy (87.5%), followed by the negative-valence (85.0%), TAT (84.2%) and neutral-valence (81.7%) tasks. SHAP analysis highlighted task-specific contributions of pitch-related and spectral features. Significant differences in speech duration across tasks (p<0.01) indicated that affective valence influenced speech production. CONCLUSIONS Emotionally enriched and thematically ambiguous tasks enhanced automated MDD detection, with positive-valence stimuli providing the greatest discriminative power. SHAP interpretation underscored the importance of tailoring models to different speech inputs. CLINICAL IMPLICATIONS Speech-based models incorporating emotionally evocative and projective stimuli offer a scalable, non-invasive approach for early depression screening. Their reliance on natural speech supports cross-cultural application and reduces stigma and literacy barriers. Broader validation is needed to facilitate integration into routine screening and monitoring.

查看原文本刊更多论文

基于情绪和主题的语音任务诱发抑郁的声学特征。

背景：重度抑郁症（MDD）在世界范围内仍未得到充分诊断，部分原因是依赖于自我报告的症状和临床医生管理的访谈。目的研究基于语音的图像描述分类模型能否有效区分重度抑郁症患者和健康对照者。方法120名被试（重度抑郁症59例，健康对照61例）完成4个语音任务：3个情绪评价图像（积极、中性、消极）和1个主题统觉测试（TAT）刺激。对语音响应进行分割，每个样本提取23个声学特征。采用长短期记忆（LSTM）神经网络进行分类，采用SHapley加性解释（SHAP）进行特征解释。四种传统的机器学习模型（支持向量机、决策树、k近邻、随机森林）作为比较。用重复测量方差分析来评估受试者在言语持续时间上的差异。LSTM模型在捕获时态和动态语音模式方面优于传统的分类器。正价图像任务的准确率最高（87.5%），其次是负价图像任务（85.0%）、TAT任务（84.2%）和中性图像任务（81.7%）。SHAP分析强调了音调相关和光谱特征对特定任务的贡献。不同任务间言语持续时间差异显著（p<0.01），表明情感效价影响言语产生。结论情绪丰富和主题模糊的任务增强了MDD的自动检测，其中正价刺激提供了最大的判别能力。SHAP解释强调了针对不同语音输入定制模型的重要性。临床意义基于语音的模型结合情感唤起和投射刺激为早期抑郁症筛查提供了一种可扩展的、非侵入性的方法。他们对自然语言的依赖支持了跨文化应用，减少了耻辱感和读写障碍。需要更广泛的验证，以促进纳入常规筛查和监测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMJ mental health

CiteScore

6.80

自引率

0.00%

发文量