Voice of Mind，一个从声学和词汇声乐生物标记物评估抑郁和焦虑的深度学习模型。

IF 2.4 4区医学 Q1 AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY

Journal of Voice Pub Date : 2025-09-25 DOI:10.1016/j.jvoice.2025.09.012

S Regondi, F Roncone, V Colombo, R Pugliese, E Bagli, G Russo, A Panella, M Radavelli, S Bolognini

{"title":"Voice of Mind，一个从声学和词汇声乐生物标记物评估抑郁和焦虑的深度学习模型。","authors":"S Regondi, F Roncone, V Colombo, R Pugliese, E Bagli, G Russo, A Panella, M Radavelli, S Bolognini","doi":"10.1016/j.jvoice.2025.09.012","DOIUrl":null,"url":null,"abstract":"Objective: To develop a deep learning model to assess anxiety and depression from acoustic and lexical biomarkers able to analyze Italian psychotherapy recordings and classify three distinct conditions: depression, anxiety, and no pathology.Method: Five patients diagnosed with either Major Depressive Disorder or Generalized Anxiety Disorder were selected from psychotherapy sessions conducted at RAM Psyche. A total of seven audio recordings were manually analyzed by a clinical psychologist using the DASS-21 scale, resulting in over 1000 audio segments labeled for psychopathological content. From these recordings, acoustic features and lexical markers were extracted. These features were processed through a hybrid architecture combining a Convolutional Neural Network for Mel spectrogram analysis and a Multi-Layer Perceptron for integrating lexical and acoustic inputs. Three model variants (VOM 1.1, 1.2, and 1.3) were trained and evaluated using two custom datasets (DVOM2, DVOM3), including both internal patient audio and external neutral voices.Results: The model successfully classified segments into depression, anxiety, and no pathology with promising results. Feature importance analysis revealed that prosodic cues such as lower pitch, reduced intensity, and increased pauses were highly predictive of depression, while lexical richness and adverb usage were associated with both disorders. Among the model variants, VOM 1.1 showed balanced performance across all three classes, particularly excelling in detecting depression and no pathology. In contrast, VOM 1.2 prioritized depression and anxiety detection, occasionally misclassifying ambiguous cases as symptomatic, suggesting a heightened sensitivity to subtle pathological cues. VOM 1.3 while maintaining a strong classification performance, demonstrated improved robustness on external neutral voices.Conclusions: The Voice of Mind model demonstrates the feasibility of using speech data to support mental health diagnostics. Its capacity to distinguish between depression and anxiety, while maintaining generalization across nonpathological voices, suggests its potential as a clinical decision-support tool.","PeriodicalId":49954,"journal":{"name":"Journal of Voice","volume":" ","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Voice of Mind, a Deep Learning Model for Depression and Anxiety Assessment From Acoustic and Lexical Vocal Biomarkers.\",\"authors\":\"S Regondi, F Roncone, V Colombo, R Pugliese, E Bagli, G Russo, A Panella, M Radavelli, S Bolognini\",\"doi\":\"10.1016/j.jvoice.2025.09.012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objective: To develop a deep learning model to assess anxiety and depression from acoustic and lexical biomarkers able to analyze Italian psychotherapy recordings and classify three distinct conditions: depression, anxiety, and no pathology.Method: Five patients diagnosed with either Major Depressive Disorder or Generalized Anxiety Disorder were selected from psychotherapy sessions conducted at RAM Psyche. A total of seven audio recordings were manually analyzed by a clinical psychologist using the DASS-21 scale, resulting in over 1000 audio segments labeled for psychopathological content. From these recordings, acoustic features and lexical markers were extracted. These features were processed through a hybrid architecture combining a Convolutional Neural Network for Mel spectrogram analysis and a Multi-Layer Perceptron for integrating lexical and acoustic inputs. Three model variants (VOM 1.1, 1.2, and 1.3) were trained and evaluated using two custom datasets (DVOM2, DVOM3), including both internal patient audio and external neutral voices.Results: The model successfully classified segments into depression, anxiety, and no pathology with promising results. Feature importance analysis revealed that prosodic cues such as lower pitch, reduced intensity, and increased pauses were highly predictive of depression, while lexical richness and adverb usage were associated with both disorders. Among the model variants, VOM 1.1 showed balanced performance across all three classes, particularly excelling in detecting depression and no pathology. In contrast, VOM 1.2 prioritized depression and anxiety detection, occasionally misclassifying ambiguous cases as symptomatic, suggesting a heightened sensitivity to subtle pathological cues. VOM 1.3 while maintaining a strong classification performance, demonstrated improved robustness on external neutral voices.Conclusions: The Voice of Mind model demonstrates the feasibility of using speech data to support mental health diagnostics. Its capacity to distinguish between depression and anxiety, while maintaining generalization across nonpathological voices, suggests its potential as a clinical decision-support tool.\",\"PeriodicalId\":49954,\"journal\":{\"name\":\"Journal of Voice\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2025-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Voice\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jvoice.2025.09.012\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Voice","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jvoice.2025.09.012","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUDIOLOGY & SPEECH-LANGUAGE PATHOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

目的：开发一种深度学习模型，通过声学和词汇生物标志物来评估焦虑和抑郁，该模型能够分析意大利心理治疗记录，并将三种不同的情况分类：抑郁、焦虑和无病理。方法：选取在RAM Psyche进行心理治疗的5例诊断为重度抑郁障碍或广泛性焦虑障碍的患者。临床心理学家使用DASS-21量表对总共7段录音进行了人工分析，产生了1000多个标记为精神病理学内容的音频片段。从这些录音中提取声学特征和词汇标记。这些特征通过混合架构进行处理，该混合架构结合了用于Mel谱图分析的卷积神经网络和用于集成词法和声学输入的多层感知器。使用两个自定义数据集（DVOM2, DVOM3）训练和评估三个模型变体（VOM 1.1, 1.2和1.3），包括内部患者音频和外部中性声音。结果：该模型成功地将神经节段分为抑郁节段、焦虑节段和无病理节段，效果良好。特征重要性分析显示，音调较低、强度降低和停顿增加等韵律线索高度预测抑郁症，而词汇丰富度和副词使用与这两种疾病有关。在模型变体中，VOM 1.1在所有三个类别中表现平衡，特别是在检测抑郁和无病理方面表现出色。相比之下，VOM 1.2优先考虑抑郁和焦虑的检测，偶尔会将模棱两可的病例误诊为症状，这表明对微妙的病理线索高度敏感。VOM 1.3在保持较强分类性能的同时，对外部中性声音的鲁棒性有所提高。结论：心智之声模型证明了使用语音数据支持心理健康诊断的可行性。它能够区分抑郁和焦虑，同时保持对非病理性声音的概括，这表明它有潜力成为临床决策支持工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Voice of Mind, a Deep Learning Model for Depression and Anxiety Assessment From Acoustic and Lexical Vocal Biomarkers.

Objective: To develop a deep learning model to assess anxiety and depression from acoustic and lexical biomarkers able to analyze Italian psychotherapy recordings and classify three distinct conditions: depression, anxiety, and no pathology.

Method: Five patients diagnosed with either Major Depressive Disorder or Generalized Anxiety Disorder were selected from psychotherapy sessions conducted at RAM Psyche. A total of seven audio recordings were manually analyzed by a clinical psychologist using the DASS-21 scale, resulting in over 1000 audio segments labeled for psychopathological content. From these recordings, acoustic features and lexical markers were extracted. These features were processed through a hybrid architecture combining a Convolutional Neural Network for Mel spectrogram analysis and a Multi-Layer Perceptron for integrating lexical and acoustic inputs. Three model variants (VOM 1.1, 1.2, and 1.3) were trained and evaluated using two custom datasets (DVOM2, DVOM3), including both internal patient audio and external neutral voices.

Results: The model successfully classified segments into depression, anxiety, and no pathology with promising results. Feature importance analysis revealed that prosodic cues such as lower pitch, reduced intensity, and increased pauses were highly predictive of depression, while lexical richness and adverb usage were associated with both disorders. Among the model variants, VOM 1.1 showed balanced performance across all three classes, particularly excelling in detecting depression and no pathology. In contrast, VOM 1.2 prioritized depression and anxiety detection, occasionally misclassifying ambiguous cases as symptomatic, suggesting a heightened sensitivity to subtle pathological cues. VOM 1.3 while maintaining a strong classification performance, demonstrated improved robustness on external neutral voices.

Conclusions: The Voice of Mind model demonstrates the feasibility of using speech data to support mental health diagnostics. Its capacity to distinguish between depression and anxiety, while maintaining generalization across nonpathological voices, suggests its potential as a clinical decision-support tool.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Voice 医学-耳鼻喉科学

CiteScore

4.00

自引率

13.60%

发文量

395

审稿时长

59 days

期刊介绍： The Journal of Voice is widely regarded as the world''s premiere journal for voice medicine and research. This peer-reviewed publication is listed in Index Medicus and is indexed by the Institute for Scientific Information. The journal contains articles written by experts throughout the world on all topics in voice sciences, voice medicine and surgery, and speech-language pathologists'' management of voice-related problems. The journal includes clinical articles, clinical research, and laboratory research. Members of the Foundation receive the journal as a benefit of membership.