Chenyang Xu , Yangbin Chen , Yanbao Tao , Wanqing Xie , Xiaofeng Liu , Yunhan Lin , Chunfeng Liang , Fan Du , Zhixiong Lin , Chuan Shi
{"title":"基于深度学习的抑郁症检测,融合听觉、视觉和文字线索。","authors":"Chenyang Xu , Yangbin Chen , Yanbao Tao , Wanqing Xie , Xiaofeng Liu , Yunhan Lin , Chunfeng Liang , Fan Du , Zhixiong Lin , Chuan Shi","doi":"10.1016/j.jad.2025.119860","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Early detection of depression is crucial for implementing interventions. Deep learning-based computer vision (CV), semantic, and acoustic analysis have enabled the automated analysis of visual and auditory signals.</div></div><div><h3>Objective</h3><div>We proposed an automated depression detection model based on artificial intelligence (AI) that integrated visual, auditory, and textual clues. Moreover, we validated the model's performance in multiple scenarios, including interviews with chatbot.</div></div><div><h3>Methods</h3><div>A chatbot for depressive symptom inquiry powered by GPT-2.0 was developed. The brief affective interview task was designed as supplement. Audio-video and textual clues were captured during interview, and features from different modalities were fused using a multi-head cross-attention network. To validate the model's generalizability, we performed external validation with an independent dataset.</div></div><div><h3>Results</h3><div>(1)In the internal validation set (152 depression patients and 118 healthy controls), the multimodal model demonstrated strong predictive power for depression in all scenarios, with an area under the curve (AUC) exceeding 0.950 and an accuracy over 0.930. Under the symptomatic interview by chatbot scenario, the model showed exceptional performance, achieving an AUC of 0.999. Specificity decreases slightly (0.883) in the Brief Affective Interview Task. The multimodal model outperformed unimodal and bimodal counterparts. (2)For external validation under the symptomatic interview by chatbot scenario, a geographically distinct dataset (55 depression patients and 45 healthy controls) was employed. The multimodal fusion model achieved an AUC of 0.978, though all modality combinations exhibited reduced performance compared to internal validation.</div></div><div><h3>Limitations</h3><div>Longitudinal follow-up was not conducted in this study, and severe depression applicability requires further study.</div></div>","PeriodicalId":14963,"journal":{"name":"Journal of affective disorders","volume":"391 ","pages":"Article 119860"},"PeriodicalIF":4.9000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep learning-based detection of depression by fusing auditory, visual and textual clues\",\"authors\":\"Chenyang Xu , Yangbin Chen , Yanbao Tao , Wanqing Xie , Xiaofeng Liu , Yunhan Lin , Chunfeng Liang , Fan Du , Zhixiong Lin , Chuan Shi\",\"doi\":\"10.1016/j.jad.2025.119860\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Early detection of depression is crucial for implementing interventions. Deep learning-based computer vision (CV), semantic, and acoustic analysis have enabled the automated analysis of visual and auditory signals.</div></div><div><h3>Objective</h3><div>We proposed an automated depression detection model based on artificial intelligence (AI) that integrated visual, auditory, and textual clues. Moreover, we validated the model's performance in multiple scenarios, including interviews with chatbot.</div></div><div><h3>Methods</h3><div>A chatbot for depressive symptom inquiry powered by GPT-2.0 was developed. The brief affective interview task was designed as supplement. Audio-video and textual clues were captured during interview, and features from different modalities were fused using a multi-head cross-attention network. To validate the model's generalizability, we performed external validation with an independent dataset.</div></div><div><h3>Results</h3><div>(1)In the internal validation set (152 depression patients and 118 healthy controls), the multimodal model demonstrated strong predictive power for depression in all scenarios, with an area under the curve (AUC) exceeding 0.950 and an accuracy over 0.930. Under the symptomatic interview by chatbot scenario, the model showed exceptional performance, achieving an AUC of 0.999. Specificity decreases slightly (0.883) in the Brief Affective Interview Task. The multimodal model outperformed unimodal and bimodal counterparts. (2)For external validation under the symptomatic interview by chatbot scenario, a geographically distinct dataset (55 depression patients and 45 healthy controls) was employed. The multimodal fusion model achieved an AUC of 0.978, though all modality combinations exhibited reduced performance compared to internal validation.</div></div><div><h3>Limitations</h3><div>Longitudinal follow-up was not conducted in this study, and severe depression applicability requires further study.</div></div>\",\"PeriodicalId\":14963,\"journal\":{\"name\":\"Journal of affective disorders\",\"volume\":\"391 \",\"pages\":\"Article 119860\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of affective disorders\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0165032725013023\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of affective disorders","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165032725013023","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
Deep learning-based detection of depression by fusing auditory, visual and textual clues
Background
Early detection of depression is crucial for implementing interventions. Deep learning-based computer vision (CV), semantic, and acoustic analysis have enabled the automated analysis of visual and auditory signals.
Objective
We proposed an automated depression detection model based on artificial intelligence (AI) that integrated visual, auditory, and textual clues. Moreover, we validated the model's performance in multiple scenarios, including interviews with chatbot.
Methods
A chatbot for depressive symptom inquiry powered by GPT-2.0 was developed. The brief affective interview task was designed as supplement. Audio-video and textual clues were captured during interview, and features from different modalities were fused using a multi-head cross-attention network. To validate the model's generalizability, we performed external validation with an independent dataset.
Results
(1)In the internal validation set (152 depression patients and 118 healthy controls), the multimodal model demonstrated strong predictive power for depression in all scenarios, with an area under the curve (AUC) exceeding 0.950 and an accuracy over 0.930. Under the symptomatic interview by chatbot scenario, the model showed exceptional performance, achieving an AUC of 0.999. Specificity decreases slightly (0.883) in the Brief Affective Interview Task. The multimodal model outperformed unimodal and bimodal counterparts. (2)For external validation under the symptomatic interview by chatbot scenario, a geographically distinct dataset (55 depression patients and 45 healthy controls) was employed. The multimodal fusion model achieved an AUC of 0.978, though all modality combinations exhibited reduced performance compared to internal validation.
Limitations
Longitudinal follow-up was not conducted in this study, and severe depression applicability requires further study.
期刊介绍:
The Journal of Affective Disorders publishes papers concerned with affective disorders in the widest sense: depression, mania, mood spectrum, emotions and personality, anxiety and stress. It is interdisciplinary and aims to bring together different approaches for a diverse readership. Top quality papers will be accepted dealing with any aspect of affective disorders, including neuroimaging, cognitive neurosciences, genetics, molecular biology, experimental and clinical neurosciences, pharmacology, neuroimmunoendocrinology, intervention and treatment trials.