多模态抑郁检测的多层次时空图注意融合

IF 4.9 2区医学 Q1 ENGINEERING, BIOMEDICAL

Biomedical Signal Processing and Control Pub Date : 2025-06-14 DOI:10.1016/j.bspc.2025.108123

Yujie Yang, Wenbin Zheng

{"title":"多模态抑郁检测的多层次时空图注意融合","authors":"Yujie Yang, Wenbin Zheng","doi":"10.1016/j.bspc.2025.108123","DOIUrl":null,"url":null,"abstract":"<div><div>Depression is a severe mental illness that affects hundreds of millions of people worldwide. In recent years, depression detection methods that integrate multimodal information have achieved significant results. However, limited by the small sample size of depression datasets, previous studies primarily focus on the impact of heterogeneous information in multimodal fusion, while deep interactions within each modality are often overlooked. Moreover, previous multimodal fusion methods often employed concatenation operations, which only allow modal features to be statically combined in the vector space and do not explicitly model the cross-modal semantic relationships. To address these issues, we propose a novel method named Multi-level Spatiotemporal Graph Attention Fusion (MSGAF), which enhances information interaction and sharing through multi-step fusion both within and between modalities. Specifically, within each modality containing multiple features, we designed a Multi-feature Temporal Fusion (MTF) module. The MTF module can fuse various features during the same time period to discover interactions among these features. For multimodal fusion, we adopt a multi-level fusion strategy to integrate these modalities, with the fusion process is represented as a Bidirectional Fusion Graph (BiFG). The graph attention mechanism is utilized to aggregate node information across the spatial neighborhood of the BiFG, which allows the graph structure to dynamically and adaptively capture the asymmetric relationships between modalities. Extensive experiments and analyses demonstrate the effectiveness of MSGAF, which achieves state-of-the-art performance on both the DAIC-WOZ and E-DAIC datasets. The code is available at: <span><span>https://github.com/wenbin-zheng/MSGAF</span><svg><path></path></svg></span></div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"110 ","pages":"Article 108123"},"PeriodicalIF":4.9000,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-level spatiotemporal graph attention fusion for multimodal depression detection\",\"authors\":\"Yujie Yang, Wenbin Zheng\",\"doi\":\"10.1016/j.bspc.2025.108123\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Depression is a severe mental illness that affects hundreds of millions of people worldwide. In recent years, depression detection methods that integrate multimodal information have achieved significant results. However, limited by the small sample size of depression datasets, previous studies primarily focus on the impact of heterogeneous information in multimodal fusion, while deep interactions within each modality are often overlooked. Moreover, previous multimodal fusion methods often employed concatenation operations, which only allow modal features to be statically combined in the vector space and do not explicitly model the cross-modal semantic relationships. To address these issues, we propose a novel method named Multi-level Spatiotemporal Graph Attention Fusion (MSGAF), which enhances information interaction and sharing through multi-step fusion both within and between modalities. Specifically, within each modality containing multiple features, we designed a Multi-feature Temporal Fusion (MTF) module. The MTF module can fuse various features during the same time period to discover interactions among these features. For multimodal fusion, we adopt a multi-level fusion strategy to integrate these modalities, with the fusion process is represented as a Bidirectional Fusion Graph (BiFG). The graph attention mechanism is utilized to aggregate node information across the spatial neighborhood of the BiFG, which allows the graph structure to dynamically and adaptively capture the asymmetric relationships between modalities. Extensive experiments and analyses demonstrate the effectiveness of MSGAF, which achieves state-of-the-art performance on both the DAIC-WOZ and E-DAIC datasets. The code is available at: <span><span>https://github.com/wenbin-zheng/MSGAF</span><svg><path></path></svg></span></div></div>\",\"PeriodicalId\":55362,\"journal\":{\"name\":\"Biomedical Signal Processing and Control\",\"volume\":\"110 \",\"pages\":\"Article 108123\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-06-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomedical Signal Processing and Control\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1746809425006342\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Signal Processing and Control","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1746809425006342","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

摘要

抑郁症是一种严重的精神疾病，影响着全世界数亿人。近年来，综合多模态信息的抑郁症检测方法取得了显著的成果。然而，受限于抑郁症数据集的小样本量，以往的研究主要关注异构信息在多模态融合中的影响，而每个模态之间的深层相互作用往往被忽视。此外，以前的多模态融合方法通常采用串联操作，仅允许模态特征在向量空间中静态组合，而不能显式地建模跨模态语义关系。为了解决这些问题，我们提出了一种新的方法，即多层次时空图注意融合（MSGAF），该方法通过模态内部和模态之间的多步融合来增强信息交互和共享。具体来说，在每个包含多个特征的模态中，我们设计了一个多特征时间融合（MTF）模块。MTF模块可以在同一时间段内融合各种特征，以发现这些特征之间的相互作用。对于多模态融合，我们采用多级融合策略来整合这些模态，融合过程被表示为双向融合图（Bidirectional fusion Graph, BiFG）。利用图的注意机制，对跨空间邻域的节点信息进行聚合，使图结构能够动态、自适应地捕捉模态之间的不对称关系。大量的实验和分析证明了MSGAF的有效性，它在DAIC-WOZ和E-DAIC数据集上都达到了最先进的性能。代码可从https://github.com/wenbin-zheng/MSGAF获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-level spatiotemporal graph attention fusion for multimodal depression detection

Depression is a severe mental illness that affects hundreds of millions of people worldwide. In recent years, depression detection methods that integrate multimodal information have achieved significant results. However, limited by the small sample size of depression datasets, previous studies primarily focus on the impact of heterogeneous information in multimodal fusion, while deep interactions within each modality are often overlooked. Moreover, previous multimodal fusion methods often employed concatenation operations, which only allow modal features to be statically combined in the vector space and do not explicitly model the cross-modal semantic relationships. To address these issues, we propose a novel method named Multi-level Spatiotemporal Graph Attention Fusion (MSGAF), which enhances information interaction and sharing through multi-step fusion both within and between modalities. Specifically, within each modality containing multiple features, we designed a Multi-feature Temporal Fusion (MTF) module. The MTF module can fuse various features during the same time period to discover interactions among these features. For multimodal fusion, we adopt a multi-level fusion strategy to integrate these modalities, with the fusion process is represented as a Bidirectional Fusion Graph (BiFG). The graph attention mechanism is utilized to aggregate node information across the spatial neighborhood of the BiFG, which allows the graph structure to dynamically and adaptively capture the asymmetric relationships between modalities. Extensive experiments and analyses demonstrate the effectiveness of MSGAF, which achieves state-of-the-art performance on both the DAIC-WOZ and E-DAIC datasets. The code is available at: https://github.com/wenbin-zheng/MSGAF

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biomedical Signal Processing and Control 工程技术-工程：生物医学

CiteScore

9.80

自引率

13.70%

发文量

822

审稿时长

4 months

期刊介绍： Biomedical Signal Processing and Control aims to provide a cross-disciplinary international forum for the interchange of information on research in the measurement and analysis of signals and images in clinical medicine and the biological sciences. Emphasis is placed on contributions dealing with the practical, applications-led research on the use of methods and devices in clinical diagnosis, patient monitoring and management. Biomedical Signal Processing and Control reflects the main areas in which these methods are being used and developed at the interface of both engineering and clinical science. The scope of the journal is defined to include relevant review papers, technical notes, short communications and letters. Tutorial papers and special issues will also be published.