基于自适应路由的情绪识别预注意语音信号处理

IF 4.9 2区医学 Q1 ENGINEERING, BIOMEDICAL

Biomedical Signal Processing and Control Pub Date : 2025-10-07 DOI:10.1016/j.bspc.2025.108782

Huiyun Zhang , Zilong Pang , Puyang Zhao , Gaigai Tang , Lingfeng Shen , Guanghui Wang

{"title":"基于自适应路由的情绪识别预注意语音信号处理","authors":"Huiyun Zhang , Zilong Pang , Puyang Zhao , Gaigai Tang , Lingfeng Shen , Guanghui Wang","doi":"10.1016/j.bspc.2025.108782","DOIUrl":null,"url":null,"abstract":"<div><div>Emotion recognition from speech is essential for various applications in human–computer interaction, customer service, healthcare, and entertainment. However, developing robust and reproducible Speech emotion recognition (SER) systems is challenging due to the complexity of emotions and variability in speech signals. In this paper, we first define the concept of reproducibility in the context of deep learning models. We then introduce SpeechNet, a novel deep learning model designed to enhance reproducibility and robustness in SER. SpeechNet integrates multiple advanced components: speech recall, speech attention, and speech signal refinement modules to effectively capture temporal dependencies and emotional cues in speech signal. Additionally, it incorporates a pre-attention mechanism and a modified routing technique to improve feature emphasis and processing efficiency. We also explore effective acoustic feature fusion technique. Extensive experiments on several benchmark datasets demonstrate that the SpeechNet model achieves better performance and reproducibility compared to existing models. By addressing reproducibility and robustness, SpeechNet sets a new standard in SER, facilitating reliable and practical applications.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"112 ","pages":"Article 108782"},"PeriodicalIF":4.9000,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Pre-attentive speech signal processing with adaptive routing for emotion recognition\",\"authors\":\"Huiyun Zhang , Zilong Pang , Puyang Zhao , Gaigai Tang , Lingfeng Shen , Guanghui Wang\",\"doi\":\"10.1016/j.bspc.2025.108782\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Emotion recognition from speech is essential for various applications in human–computer interaction, customer service, healthcare, and entertainment. However, developing robust and reproducible Speech emotion recognition (SER) systems is challenging due to the complexity of emotions and variability in speech signals. In this paper, we first define the concept of reproducibility in the context of deep learning models. We then introduce SpeechNet, a novel deep learning model designed to enhance reproducibility and robustness in SER. SpeechNet integrates multiple advanced components: speech recall, speech attention, and speech signal refinement modules to effectively capture temporal dependencies and emotional cues in speech signal. Additionally, it incorporates a pre-attention mechanism and a modified routing technique to improve feature emphasis and processing efficiency. We also explore effective acoustic feature fusion technique. Extensive experiments on several benchmark datasets demonstrate that the SpeechNet model achieves better performance and reproducibility compared to existing models. By addressing reproducibility and robustness, SpeechNet sets a new standard in SER, facilitating reliable and practical applications.</div></div>\",\"PeriodicalId\":55362,\"journal\":{\"name\":\"Biomedical Signal Processing and Control\",\"volume\":\"112 \",\"pages\":\"Article 108782\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomedical Signal Processing and Control\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1746809425012935\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Signal Processing and Control","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1746809425012935","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

摘要

语音的情感识别对于人机交互、客户服务、医疗保健和娱乐等各种应用至关重要。然而，由于情绪的复杂性和语音信号的可变性，开发鲁棒性和可重复性语音情绪识别（SER）系统具有挑战性。在本文中，我们首先在深度学习模型的背景下定义了再现性的概念。然后，我们介绍了一种新的深度学习模型SpeechNet，旨在提高SER的可重复性和鲁棒性。SpeechNet集成了多个高级组件：语音回忆，语音注意和语音信号细化模块，有效捕获语音信号中的时间依赖性和情感线索。此外，它还结合了一种预注意机制和一种改进的路由技术，以提高特征强调度和处理效率。我们还探索了有效的声学特征融合技术。在多个基准数据集上的大量实验表明，与现有模型相比，该模型具有更好的性能和可重复性。通过解决再现性和健壮性问题，SpeechNet在SER中树立了一个新的标准，促进了可靠和实际的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Pre-attentive speech signal processing with adaptive routing for emotion recognition

Emotion recognition from speech is essential for various applications in human–computer interaction, customer service, healthcare, and entertainment. However, developing robust and reproducible Speech emotion recognition (SER) systems is challenging due to the complexity of emotions and variability in speech signals. In this paper, we first define the concept of reproducibility in the context of deep learning models. We then introduce SpeechNet, a novel deep learning model designed to enhance reproducibility and robustness in SER. SpeechNet integrates multiple advanced components: speech recall, speech attention, and speech signal refinement modules to effectively capture temporal dependencies and emotional cues in speech signal. Additionally, it incorporates a pre-attention mechanism and a modified routing technique to improve feature emphasis and processing efficiency. We also explore effective acoustic feature fusion technique. Extensive experiments on several benchmark datasets demonstrate that the SpeechNet model achieves better performance and reproducibility compared to existing models. By addressing reproducibility and robustness, SpeechNet sets a new standard in SER, facilitating reliable and practical applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biomedical Signal Processing and Control 工程技术-工程：生物医学

CiteScore

9.80

自引率

13.70%

发文量

822

审稿时长

4 months

期刊介绍： Biomedical Signal Processing and Control aims to provide a cross-disciplinary international forum for the interchange of information on research in the measurement and analysis of signals and images in clinical medicine and the biological sciences. Emphasis is placed on contributions dealing with the practical, applications-led research on the use of methods and devices in clinical diagnosis, patient monitoring and management. Biomedical Signal Processing and Control reflects the main areas in which these methods are being used and developed at the interface of both engineering and clinical science. The scope of the journal is defined to include relevant review papers, technical notes, short communications and letters. Tutorial papers and special issues will also be published.