基于深度学习的抑郁症检测，融合听觉、视觉和文字线索。

IF 4.9 2区医学 Q1 CLINICAL NEUROLOGY

Journal of affective disorders Pub Date : 2025-07-16 DOI:10.1016/j.jad.2025.119860

Chenyang Xu , Yangbin Chen , Yanbao Tao , Wanqing Xie , Xiaofeng Liu , Yunhan Lin , Chunfeng Liang , Fan Du , Zhixiong Lin , Chuan Shi

{"title":"基于深度学习的抑郁症检测，融合听觉、视觉和文字线索。","authors":"Chenyang Xu , Yangbin Chen , Yanbao Tao , Wanqing Xie , Xiaofeng Liu , Yunhan Lin , Chunfeng Liang , Fan Du , Zhixiong Lin , Chuan Shi","doi":"10.1016/j.jad.2025.119860","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Early detection of depression is crucial for implementing interventions. Deep learning-based computer vision (CV), semantic, and acoustic analysis have enabled the automated analysis of visual and auditory signals.</div></div><div><h3>Objective</h3><div>We proposed an automated depression detection model based on artificial intelligence (AI) that integrated visual, auditory, and textual clues. Moreover, we validated the model's performance in multiple scenarios, including interviews with chatbot.</div></div><div><h3>Methods</h3><div>A chatbot for depressive symptom inquiry powered by GPT-2.0 was developed. The brief affective interview task was designed as supplement. Audio-video and textual clues were captured during interview, and features from different modalities were fused using a multi-head cross-attention network. To validate the model's generalizability, we performed external validation with an independent dataset.</div></div><div><h3>Results</h3><div>(1)In the internal validation set (152 depression patients and 118 healthy controls), the multimodal model demonstrated strong predictive power for depression in all scenarios, with an area under the curve (AUC) exceeding 0.950 and an accuracy over 0.930. Under the symptomatic interview by chatbot scenario, the model showed exceptional performance, achieving an AUC of 0.999. Specificity decreases slightly (0.883) in the Brief Affective Interview Task. The multimodal model outperformed unimodal and bimodal counterparts. (2)For external validation under the symptomatic interview by chatbot scenario, a geographically distinct dataset (55 depression patients and 45 healthy controls) was employed. The multimodal fusion model achieved an AUC of 0.978, though all modality combinations exhibited reduced performance compared to internal validation.</div></div><div><h3>Limitations</h3><div>Longitudinal follow-up was not conducted in this study, and severe depression applicability requires further study.</div></div>","PeriodicalId":14963,"journal":{"name":"Journal of affective disorders","volume":"391 ","pages":"Article 119860"},"PeriodicalIF":4.9000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep learning-based detection of depression by fusing auditory, visual and textual clues\",\"authors\":\"Chenyang Xu , Yangbin Chen , Yanbao Tao , Wanqing Xie , Xiaofeng Liu , Yunhan Lin , Chunfeng Liang , Fan Du , Zhixiong Lin , Chuan Shi\",\"doi\":\"10.1016/j.jad.2025.119860\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Early detection of depression is crucial for implementing interventions. Deep learning-based computer vision (CV), semantic, and acoustic analysis have enabled the automated analysis of visual and auditory signals.</div></div><div><h3>Objective</h3><div>We proposed an automated depression detection model based on artificial intelligence (AI) that integrated visual, auditory, and textual clues. Moreover, we validated the model's performance in multiple scenarios, including interviews with chatbot.</div></div><div><h3>Methods</h3><div>A chatbot for depressive symptom inquiry powered by GPT-2.0 was developed. The brief affective interview task was designed as supplement. Audio-video and textual clues were captured during interview, and features from different modalities were fused using a multi-head cross-attention network. To validate the model's generalizability, we performed external validation with an independent dataset.</div></div><div><h3>Results</h3><div>(1)In the internal validation set (152 depression patients and 118 healthy controls), the multimodal model demonstrated strong predictive power for depression in all scenarios, with an area under the curve (AUC) exceeding 0.950 and an accuracy over 0.930. Under the symptomatic interview by chatbot scenario, the model showed exceptional performance, achieving an AUC of 0.999. Specificity decreases slightly (0.883) in the Brief Affective Interview Task. The multimodal model outperformed unimodal and bimodal counterparts. (2)For external validation under the symptomatic interview by chatbot scenario, a geographically distinct dataset (55 depression patients and 45 healthy controls) was employed. The multimodal fusion model achieved an AUC of 0.978, though all modality combinations exhibited reduced performance compared to internal validation.</div></div><div><h3>Limitations</h3><div>Longitudinal follow-up was not conducted in this study, and severe depression applicability requires further study.</div></div>\",\"PeriodicalId\":14963,\"journal\":{\"name\":\"Journal of affective disorders\",\"volume\":\"391 \",\"pages\":\"Article 119860\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of affective disorders\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0165032725013023\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of affective disorders","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165032725013023","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}

引用次数: 0

摘要

背景：早期发现抑郁症对于实施干预措施至关重要。基于深度学习的计算机视觉（CV）、语义和声学分析使视觉和听觉信号的自动分析成为可能。目的：提出一种结合视觉、听觉和文字线索的基于人工智能（AI）的抑郁症自动检测模型。此外，我们在多个场景中验证了模型的性能，包括与聊天机器人的访谈。方法：开发基于GPT-2.0的抑郁症症状查询聊天机器人。设计了简短情感访谈任务作为补充。在访谈过程中捕捉音视频和文字线索，并利用具有多头交叉注意机制的网络融合不同模式的特征。为了验证模型的泛化性，我们使用独立的数据集进行了外部验证。结果：(1)在内部验证集（152例抑郁症患者和118例hc）中，多模态模型对所有情景的抑郁均有较好的预测能力，曲线下面积（AUC）大于0.950，准确率大于0.930。在聊天机器人对症访谈场景下，该模型取得了优异的表现，AUC为0.999。简短情感访谈任务的特异性略有下降（0.883）。(2)为了在聊天机器人情景的症状访谈下进行外部验证，使用了一个地理上不同的数据集（55名抑郁症患者和45名hc）。尽管与内部验证相比，所有模态组合的性能都有所降低，但多模态融合模型的AUC为0.978。局限性：本研究未进行纵向随访，重度抑郁症的适用性有待进一步研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep learning-based detection of depression by fusing auditory, visual and textual clues

Background

Early detection of depression is crucial for implementing interventions. Deep learning-based computer vision (CV), semantic, and acoustic analysis have enabled the automated analysis of visual and auditory signals.

Objective

We proposed an automated depression detection model based on artificial intelligence (AI) that integrated visual, auditory, and textual clues. Moreover, we validated the model's performance in multiple scenarios, including interviews with chatbot.

Methods

A chatbot for depressive symptom inquiry powered by GPT-2.0 was developed. The brief affective interview task was designed as supplement. Audio-video and textual clues were captured during interview, and features from different modalities were fused using a multi-head cross-attention network. To validate the model's generalizability, we performed external validation with an independent dataset.

Results

(1)In the internal validation set (152 depression patients and 118 healthy controls), the multimodal model demonstrated strong predictive power for depression in all scenarios, with an area under the curve (AUC) exceeding 0.950 and an accuracy over 0.930. Under the symptomatic interview by chatbot scenario, the model showed exceptional performance, achieving an AUC of 0.999. Specificity decreases slightly (0.883) in the Brief Affective Interview Task. The multimodal model outperformed unimodal and bimodal counterparts. (2)For external validation under the symptomatic interview by chatbot scenario, a geographically distinct dataset (55 depression patients and 45 healthy controls) was employed. The multimodal fusion model achieved an AUC of 0.978, though all modality combinations exhibited reduced performance compared to internal validation.

Limitations

Longitudinal follow-up was not conducted in this study, and severe depression applicability requires further study.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of affective disorders 医学-精神病学

CiteScore

10.90

自引率

6.10%

发文量

1319

审稿时长

9.3 weeks

期刊介绍： The Journal of Affective Disorders publishes papers concerned with affective disorders in the widest sense: depression, mania, mood spectrum, emotions and personality, anxiety and stress. It is interdisciplinary and aims to bring together different approaches for a diverse readership. Top quality papers will be accepted dealing with any aspect of affective disorders, including neuroimaging, cognitive neurosciences, genetics, molecular biology, experimental and clinical neurosciences, pharmacology, neuroimmunoendocrinology, intervention and treatment trials.