基于注意机制的多模态情绪识别的门控泄漏集成-火峰神经网络

IF 2.9 3区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Digital Signal Processing Pub Date : 2025-05-13 DOI:10.1016/j.dsp.2025.105322

Guoming Chen , Zhuoxian Qian , Shuang Qiu , Dong Zhang , Ruqi Zhou

{"title":"基于注意机制的多模态情绪识别的门控泄漏集成-火峰神经网络","authors":"Guoming Chen , Zhuoxian Qian , Shuang Qiu , Dong Zhang , Ruqi Zhou","doi":"10.1016/j.dsp.2025.105322","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-modal emotion recognition is a key research area in human-computer interaction. It involves processing heterogeneous multi-modal signals, which present challenges in signal alignment while aiming to enhance accuracy and reduce computational complexity. To address these challenges, we apply swarm decomposition to EEG signals to reduce noise and extract Short-Time Fourier Transform features. Heatmap features are then derived from these signals, as well as from other non-physiological signals such as facial expressions, voice, and text. These features from various sources are aligned using Discrete Wavelet Transform. We propose a Gated Leaky Integrate-and-Fire Spiking Convolutional Vision Transformer (GLIFCVT) framework for multimodal emotion recognition. This framework utilizes visual features as the primary modality and incorporates a spiking gated attention mechanism to enhance multimodal fusion and classification. In addition, we propose a novel loss function that integrates Focal and Dice losses to address class imbalance. Experiments demonstrate our proposed model consistently outperform state-of-the-art methods in both accuracy and energy efficiency.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"165 ","pages":"Article 105322"},"PeriodicalIF":2.9000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A gated leaky integrate-and-fire spiking neural network based on attention mechanism for multi-modal emotion recognition\",\"authors\":\"Guoming Chen , Zhuoxian Qian , Shuang Qiu , Dong Zhang , Ruqi Zhou\",\"doi\":\"10.1016/j.dsp.2025.105322\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multi-modal emotion recognition is a key research area in human-computer interaction. It involves processing heterogeneous multi-modal signals, which present challenges in signal alignment while aiming to enhance accuracy and reduce computational complexity. To address these challenges, we apply swarm decomposition to EEG signals to reduce noise and extract Short-Time Fourier Transform features. Heatmap features are then derived from these signals, as well as from other non-physiological signals such as facial expressions, voice, and text. These features from various sources are aligned using Discrete Wavelet Transform. We propose a Gated Leaky Integrate-and-Fire Spiking Convolutional Vision Transformer (GLIFCVT) framework for multimodal emotion recognition. This framework utilizes visual features as the primary modality and incorporates a spiking gated attention mechanism to enhance multimodal fusion and classification. In addition, we propose a novel loss function that integrates Focal and Dice losses to address class imbalance. Experiments demonstrate our proposed model consistently outperform state-of-the-art methods in both accuracy and energy efficiency.</div></div>\",\"PeriodicalId\":51011,\"journal\":{\"name\":\"Digital Signal Processing\",\"volume\":\"165 \",\"pages\":\"Article 105322\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Signal Processing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1051200425003446\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1051200425003446","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

多模态情感识别是人机交互领域的一个重要研究方向。它涉及到处理异构多模态信号，在提高精度和降低计算复杂度的同时，对信号对准提出了挑战。为了解决这些问题，我们对脑电图信号进行了群分解，以降低噪声并提取短时傅里叶变换特征。然后从这些信号以及其他非生理信号（如面部表情、声音和文本）中导出热图特征。使用离散小波变换对来自不同来源的这些特征进行对齐。我们提出了一种门控漏积分和火焰尖峰卷积视觉变压器（GLIFCVT）框架，用于多模态情感识别。该框架利用视觉特征作为主要模态，并结合了一个尖峰门控注意机制来增强多模态融合和分类。此外，我们提出了一个新的损失函数，它集成了Focal和Dice损失来解决职业不平衡问题。实验表明，我们提出的模型在准确性和能源效率方面始终优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A gated leaky integrate-and-fire spiking neural network based on attention mechanism for multi-modal emotion recognition

Multi-modal emotion recognition is a key research area in human-computer interaction. It involves processing heterogeneous multi-modal signals, which present challenges in signal alignment while aiming to enhance accuracy and reduce computational complexity. To address these challenges, we apply swarm decomposition to EEG signals to reduce noise and extract Short-Time Fourier Transform features. Heatmap features are then derived from these signals, as well as from other non-physiological signals such as facial expressions, voice, and text. These features from various sources are aligned using Discrete Wavelet Transform. We propose a Gated Leaky Integrate-and-Fire Spiking Convolutional Vision Transformer (GLIFCVT) framework for multimodal emotion recognition. This framework utilizes visual features as the primary modality and incorporates a spiking gated attention mechanism to enhance multimodal fusion and classification. In addition, we propose a novel loss function that integrates Focal and Dice losses to address class imbalance. Experiments demonstrate our proposed model consistently outperform state-of-the-art methods in both accuracy and energy efficiency.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Digital Signal Processing 工程技术-工程：电子与电气

CiteScore

5.30

自引率

17.20%

发文量

435

审稿时长

66 days

期刊介绍： Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal. The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as: • big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,