{"title":"多模态抑郁检测的多层次时空图注意融合","authors":"Yujie Yang, Wenbin Zheng","doi":"10.1016/j.bspc.2025.108123","DOIUrl":null,"url":null,"abstract":"<div><div>Depression is a severe mental illness that affects hundreds of millions of people worldwide. In recent years, depression detection methods that integrate multimodal information have achieved significant results. However, limited by the small sample size of depression datasets, previous studies primarily focus on the impact of heterogeneous information in multimodal fusion, while deep interactions within each modality are often overlooked. Moreover, previous multimodal fusion methods often employed concatenation operations, which only allow modal features to be statically combined in the vector space and do not explicitly model the cross-modal semantic relationships. To address these issues, we propose a novel method named Multi-level Spatiotemporal Graph Attention Fusion (MSGAF), which enhances information interaction and sharing through multi-step fusion both within and between modalities. Specifically, within each modality containing multiple features, we designed a Multi-feature Temporal Fusion (MTF) module. The MTF module can fuse various features during the same time period to discover interactions among these features. For multimodal fusion, we adopt a multi-level fusion strategy to integrate these modalities, with the fusion process is represented as a Bidirectional Fusion Graph (BiFG). The graph attention mechanism is utilized to aggregate node information across the spatial neighborhood of the BiFG, which allows the graph structure to dynamically and adaptively capture the asymmetric relationships between modalities. Extensive experiments and analyses demonstrate the effectiveness of MSGAF, which achieves state-of-the-art performance on both the DAIC-WOZ and E-DAIC datasets. The code is available at: <span><span>https://github.com/wenbin-zheng/MSGAF</span><svg><path></path></svg></span></div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"110 ","pages":"Article 108123"},"PeriodicalIF":4.9000,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-level spatiotemporal graph attention fusion for multimodal depression detection\",\"authors\":\"Yujie Yang, Wenbin Zheng\",\"doi\":\"10.1016/j.bspc.2025.108123\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Depression is a severe mental illness that affects hundreds of millions of people worldwide. In recent years, depression detection methods that integrate multimodal information have achieved significant results. However, limited by the small sample size of depression datasets, previous studies primarily focus on the impact of heterogeneous information in multimodal fusion, while deep interactions within each modality are often overlooked. Moreover, previous multimodal fusion methods often employed concatenation operations, which only allow modal features to be statically combined in the vector space and do not explicitly model the cross-modal semantic relationships. To address these issues, we propose a novel method named Multi-level Spatiotemporal Graph Attention Fusion (MSGAF), which enhances information interaction and sharing through multi-step fusion both within and between modalities. Specifically, within each modality containing multiple features, we designed a Multi-feature Temporal Fusion (MTF) module. The MTF module can fuse various features during the same time period to discover interactions among these features. For multimodal fusion, we adopt a multi-level fusion strategy to integrate these modalities, with the fusion process is represented as a Bidirectional Fusion Graph (BiFG). The graph attention mechanism is utilized to aggregate node information across the spatial neighborhood of the BiFG, which allows the graph structure to dynamically and adaptively capture the asymmetric relationships between modalities. Extensive experiments and analyses demonstrate the effectiveness of MSGAF, which achieves state-of-the-art performance on both the DAIC-WOZ and E-DAIC datasets. The code is available at: <span><span>https://github.com/wenbin-zheng/MSGAF</span><svg><path></path></svg></span></div></div>\",\"PeriodicalId\":55362,\"journal\":{\"name\":\"Biomedical Signal Processing and Control\",\"volume\":\"110 \",\"pages\":\"Article 108123\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-06-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomedical Signal Processing and Control\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1746809425006342\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Signal Processing and Control","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1746809425006342","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
Multi-level spatiotemporal graph attention fusion for multimodal depression detection
Depression is a severe mental illness that affects hundreds of millions of people worldwide. In recent years, depression detection methods that integrate multimodal information have achieved significant results. However, limited by the small sample size of depression datasets, previous studies primarily focus on the impact of heterogeneous information in multimodal fusion, while deep interactions within each modality are often overlooked. Moreover, previous multimodal fusion methods often employed concatenation operations, which only allow modal features to be statically combined in the vector space and do not explicitly model the cross-modal semantic relationships. To address these issues, we propose a novel method named Multi-level Spatiotemporal Graph Attention Fusion (MSGAF), which enhances information interaction and sharing through multi-step fusion both within and between modalities. Specifically, within each modality containing multiple features, we designed a Multi-feature Temporal Fusion (MTF) module. The MTF module can fuse various features during the same time period to discover interactions among these features. For multimodal fusion, we adopt a multi-level fusion strategy to integrate these modalities, with the fusion process is represented as a Bidirectional Fusion Graph (BiFG). The graph attention mechanism is utilized to aggregate node information across the spatial neighborhood of the BiFG, which allows the graph structure to dynamically and adaptively capture the asymmetric relationships between modalities. Extensive experiments and analyses demonstrate the effectiveness of MSGAF, which achieves state-of-the-art performance on both the DAIC-WOZ and E-DAIC datasets. The code is available at: https://github.com/wenbin-zheng/MSGAF
期刊介绍:
Biomedical Signal Processing and Control aims to provide a cross-disciplinary international forum for the interchange of information on research in the measurement and analysis of signals and images in clinical medicine and the biological sciences. Emphasis is placed on contributions dealing with the practical, applications-led research on the use of methods and devices in clinical diagnosis, patient monitoring and management.
Biomedical Signal Processing and Control reflects the main areas in which these methods are being used and developed at the interface of both engineering and clinical science. The scope of the journal is defined to include relevant review papers, technical notes, short communications and letters. Tutorial papers and special issues will also be published.