Multimodal depression detection based on an attention graph convolution and transformer.

IF 2.6 4区工程技术 Q1 Mathematics

Mathematical Biosciences and Engineering Pub Date : 2025-02-27 DOI:10.3934/mbe.2025024

Xiaowen Jia, Jingxia Chen, Kexin Liu, Qian Wang, Jialing He

{"title":"Multimodal depression detection based on an attention graph convolution and transformer.","authors":"Xiaowen Jia, Jingxia Chen, Kexin Liu, Qian Wang, Jialing He","doi":"10.3934/mbe.2025024","DOIUrl":null,"url":null,"abstract":"<p><p>Traditional depression detection methods typically rely on single-modal data, but these approaches are limited by individual differences, noise interference, and emotional fluctuations. To address the low accuracy in single-modal depression detection and the poor fusion of multimodal features from electroencephalogram (EEG) and speech signals, we have proposed a multimodal depression detection model based on EEG and speech signals, named the multi-head attention-GCN_ViT (MHA-GCN_ViT). This approach leverages deep learning techniques, including graph convolutional networks (GCN) and vision transformers (ViT), to effectively extract and fuse the frequency-domain features and spatiotemporal characteristics of EEG signals with the frequency-domain features of speech signals. First, a discrete wavelet transform (DWT) was used to extract wavelet features from 29 channels of EEG signals. These features serve as node attributes for the construction of a feature matrix, calculating the Pearson correlation coefficient between channels, from which an adjacency matrix is constructed to represent the brain network structure. This structure was then fed into a graph convolutional network (GCN) for deep feature learning. A multi-head attention mechanism was introduced to enhance the GCN's capability in representing brain networks. Using a short-time Fourier transform (STFT), we extracted 2D spectral features of EEG signals and mel spectrogram features of speech signals. Both were further processed using a vision transformer (ViT) to obtain deep features. Finally, the multiple features from EEG and speech spectrograms were fused at the decision level for depression classification. A five-fold cross-validation on the MODMA dataset demonstrated the model's accuracy, precision, recall, and F1 score of 89.03%, 90.16%, 89.04%, and 88.83%, respectively, indicating a significant improvement in the performance of multimodal depression detection. Furthermore, MHA-GCN_ViT demonstrated robust performance in depression detection and exhibited broad applicability, with potential for extension to multimodal detection tasks in other psychological and neurological disorders.</p>","PeriodicalId":49870,"journal":{"name":"Mathematical Biosciences and Engineering","volume":"22 3","pages":"652-676"},"PeriodicalIF":2.6000,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical Biosciences and Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3934/mbe.2025024","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 0

Abstract

Traditional depression detection methods typically rely on single-modal data, but these approaches are limited by individual differences, noise interference, and emotional fluctuations. To address the low accuracy in single-modal depression detection and the poor fusion of multimodal features from electroencephalogram (EEG) and speech signals, we have proposed a multimodal depression detection model based on EEG and speech signals, named the multi-head attention-GCN_ViT (MHA-GCN_ViT). This approach leverages deep learning techniques, including graph convolutional networks (GCN) and vision transformers (ViT), to effectively extract and fuse the frequency-domain features and spatiotemporal characteristics of EEG signals with the frequency-domain features of speech signals. First, a discrete wavelet transform (DWT) was used to extract wavelet features from 29 channels of EEG signals. These features serve as node attributes for the construction of a feature matrix, calculating the Pearson correlation coefficient between channels, from which an adjacency matrix is constructed to represent the brain network structure. This structure was then fed into a graph convolutional network (GCN) for deep feature learning. A multi-head attention mechanism was introduced to enhance the GCN's capability in representing brain networks. Using a short-time Fourier transform (STFT), we extracted 2D spectral features of EEG signals and mel spectrogram features of speech signals. Both were further processed using a vision transformer (ViT) to obtain deep features. Finally, the multiple features from EEG and speech spectrograms were fused at the decision level for depression classification. A five-fold cross-validation on the MODMA dataset demonstrated the model's accuracy, precision, recall, and F1 score of 89.03%, 90.16%, 89.04%, and 88.83%, respectively, indicating a significant improvement in the performance of multimodal depression detection. Furthermore, MHA-GCN_ViT demonstrated robust performance in depression detection and exhibited broad applicability, with potential for extension to multimodal detection tasks in other psychological and neurological disorders.

查看原文本刊更多论文

基于注意图卷积和变压器的多模态抑郁检测。

传统的抑郁症检测方法通常依赖于单模态数据，但这些方法受到个体差异、噪声干扰和情绪波动的限制。针对单模态抑郁检测准确率低、脑电和语音信号多模态特征融合较差的问题，提出了一种基于脑电和语音信号的多模态抑郁检测模型，命名为多头部注意- gcn_vit （MHA-GCN_ViT）。该方法利用深度学习技术，包括图卷积网络（GCN）和视觉变换（ViT），有效地提取脑电信号的频域特征和时空特征，并将其与语音信号的频域特征融合。首先，利用离散小波变换（DWT）对29个通道的脑电信号进行小波特征提取；这些特征作为节点属性，用于构造特征矩阵，计算通道之间的Pearson相关系数，由此构造邻接矩阵来表示大脑网络结构。然后将该结构输入到图卷积网络（GCN）中进行深度特征学习。为了提高GCN对脑网络的表征能力，引入了多头注意机制。利用短时傅里叶变换（STFT）提取脑电信号的二维频谱特征和语音信号的mel谱图特征。使用视觉变换（ViT）对两者进行进一步处理以获得深度特征。最后，在决策层融合脑电和语音谱的多种特征进行抑郁分类。在MODMA数据集上进行5次交叉验证，结果表明，该模型的准确率、精密度、召回率和F1得分分别为89.03%、90.16%、89.04%和88.83%，表明该模型在多模态抑郁症检测性能上有了显著提高。此外，MHA-GCN_ViT在抑郁症检测中表现出稳健的性能，具有广泛的适用性，具有扩展到其他心理和神经疾病的多模态检测任务的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Mathematical Biosciences and Engineering 工程技术-数学跨学科应用

CiteScore

3.90

自引率

7.70%

发文量

586

审稿时长

>12 weeks

期刊介绍： Mathematical Biosciences and Engineering (MBE) is an interdisciplinary Open Access journal promoting cutting-edge research, technology transfer and knowledge translation about complex data and information processing. MBE publishes Research articles (long and original research); Communications (short and novel research); Expository papers; Technology Transfer and Knowledge Translation reports (description of new technologies and products); Announcements and Industrial Progress and News (announcements and even advertisement, including major conferences).