{"title":"A gated leaky integrate-and-fire spiking neural network based on attention mechanism for multi-modal emotion recognition","authors":"Guoming Chen , Zhuoxian Qian , Shuang Qiu , Dong Zhang , Ruqi Zhou","doi":"10.1016/j.dsp.2025.105322","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-modal emotion recognition is a key research area in human-computer interaction. It involves processing heterogeneous multi-modal signals, which present challenges in signal alignment while aiming to enhance accuracy and reduce computational complexity. To address these challenges, we apply swarm decomposition to EEG signals to reduce noise and extract Short-Time Fourier Transform features. Heatmap features are then derived from these signals, as well as from other non-physiological signals such as facial expressions, voice, and text. These features from various sources are aligned using Discrete Wavelet Transform. We propose a Gated Leaky Integrate-and-Fire Spiking Convolutional Vision Transformer (GLIFCVT) framework for multimodal emotion recognition. This framework utilizes visual features as the primary modality and incorporates a spiking gated attention mechanism to enhance multimodal fusion and classification. In addition, we propose a novel loss function that integrates Focal and Dice losses to address class imbalance. Experiments demonstrate our proposed model consistently outperform state-of-the-art methods in both accuracy and energy efficiency.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"165 ","pages":"Article 105322"},"PeriodicalIF":2.9000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1051200425003446","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Multi-modal emotion recognition is a key research area in human-computer interaction. It involves processing heterogeneous multi-modal signals, which present challenges in signal alignment while aiming to enhance accuracy and reduce computational complexity. To address these challenges, we apply swarm decomposition to EEG signals to reduce noise and extract Short-Time Fourier Transform features. Heatmap features are then derived from these signals, as well as from other non-physiological signals such as facial expressions, voice, and text. These features from various sources are aligned using Discrete Wavelet Transform. We propose a Gated Leaky Integrate-and-Fire Spiking Convolutional Vision Transformer (GLIFCVT) framework for multimodal emotion recognition. This framework utilizes visual features as the primary modality and incorporates a spiking gated attention mechanism to enhance multimodal fusion and classification. In addition, we propose a novel loss function that integrates Focal and Dice losses to address class imbalance. Experiments demonstrate our proposed model consistently outperform state-of-the-art methods in both accuracy and energy efficiency.
期刊介绍:
Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal.
The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as:
• big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,