A hybrid model using multimodal feature perception and multiple cross-attention fusion for depressive episodes detection

IF 15.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yaqi Wang , Tingting Qu , Wenbo Zhu , Qi Wang , Yuping Cao , Renzhou Gui
{"title":"A hybrid model using multimodal feature perception and multiple cross-attention fusion for depressive episodes detection","authors":"Yaqi Wang ,&nbsp;Tingting Qu ,&nbsp;Wenbo Zhu ,&nbsp;Qi Wang ,&nbsp;Yuping Cao ,&nbsp;Renzhou Gui","doi":"10.1016/j.inffus.2025.103354","DOIUrl":null,"url":null,"abstract":"<div><div>Depressive episodes are among the most prevalent manifestations of mood disorders worldwide. Currently, the diagnosis of depressive episodes primarily relies on professional clinical assessments. However, with the rising prevalence of depressive episodes, together with the increased diversity of subtypes, atypical presentations, and insidiousness of symptoms, timely and accurate detection of depressive episodes has become more difficult. To address this issue, a hybrid model based on multimodal feature perception and multiple cross-attention fusion (MFCAF) is proposed for the automated detection of depressive episodes. MFCAF integrates video, audio, and functional near-infrared spectroscopy (fNIRS) data collected under identical stimulus conditions. It consists of two primary phases: feature perception and feature fusion. In the feature perception stage, a multi-scale convolutional neural network (CNN) combined with a gated recurrent unit (GRU) is utilized to extract video features. Meanwhile, deep audio features are extracted by applying a Vision Transformer (ViT) to the heatmap generated from the correlation matrix of the Mel spectrogram. Additionally, a multi-channel CNN is used to extract fNIRS features. In the feature fusion stage, a Transformer-based multiple cross-attention fusion module is constructed to capture complex cross-modal dependencies. The experimental results show that, on the dataset collected from 122 participants, MFCAF can detect depressive episodes quickly and accurately, outperforming the baseline methods. The MFCAF model achieved an accuracy of 78.38% under the negative stimulus task. These results suggest that the proposed model holds promise as a rapid auxiliary detection tool for depressive episodes in large-scale populations.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"124 ","pages":"Article 103354"},"PeriodicalIF":15.5000,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525004270","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Depressive episodes are among the most prevalent manifestations of mood disorders worldwide. Currently, the diagnosis of depressive episodes primarily relies on professional clinical assessments. However, with the rising prevalence of depressive episodes, together with the increased diversity of subtypes, atypical presentations, and insidiousness of symptoms, timely and accurate detection of depressive episodes has become more difficult. To address this issue, a hybrid model based on multimodal feature perception and multiple cross-attention fusion (MFCAF) is proposed for the automated detection of depressive episodes. MFCAF integrates video, audio, and functional near-infrared spectroscopy (fNIRS) data collected under identical stimulus conditions. It consists of two primary phases: feature perception and feature fusion. In the feature perception stage, a multi-scale convolutional neural network (CNN) combined with a gated recurrent unit (GRU) is utilized to extract video features. Meanwhile, deep audio features are extracted by applying a Vision Transformer (ViT) to the heatmap generated from the correlation matrix of the Mel spectrogram. Additionally, a multi-channel CNN is used to extract fNIRS features. In the feature fusion stage, a Transformer-based multiple cross-attention fusion module is constructed to capture complex cross-modal dependencies. The experimental results show that, on the dataset collected from 122 participants, MFCAF can detect depressive episodes quickly and accurately, outperforming the baseline methods. The MFCAF model achieved an accuracy of 78.38% under the negative stimulus task. These results suggest that the proposed model holds promise as a rapid auxiliary detection tool for depressive episodes in large-scale populations.

Abstract Image

基于多模态特征感知和多重交叉注意融合的抑郁发作检测混合模型
抑郁发作是世界范围内情绪障碍最普遍的表现之一。目前,抑郁症发作的诊断主要依靠专业的临床评估。然而,随着抑郁症发病率的上升,以及亚型多样性的增加、非典型表现和症状的隐匿性,及时准确地检测抑郁症发作变得更加困难。为了解决这一问题,提出了一种基于多模态特征感知和多交叉注意融合(MFCAF)的抑郁症发作自动检测模型。MFCAF集成了在相同刺激条件下收集的视频、音频和功能近红外光谱(fNIRS)数据。它包括两个主要阶段:特征感知和特征融合。在特征感知阶段,利用多尺度卷积神经网络(CNN)结合门控递归单元(GRU)提取视频特征。同时,对Mel谱图相关矩阵生成的热图进行视觉变换(Vision Transformer, ViT),提取深度音频特征。此外,采用多通道CNN提取近红外光谱特征。在特征融合阶段,构建了基于transformer的多交叉关注融合模块,以捕获复杂的跨模态依赖关系。实验结果表明,在122名参与者的数据集上,MFCAF可以快速准确地检测出抑郁发作,优于基线方法。MFCAF模型在负刺激任务下的准确率为78.38%。这些结果表明,所提出的模型有望成为大规模人群中抑郁症发作的快速辅助检测工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Information Fusion
Information Fusion 工程技术-计算机:理论方法
CiteScore
33.20
自引率
4.30%
发文量
161
审稿时长
7.9 months
期刊介绍: Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信