Huiyun Zhang , Zilong Pang , Puyang Zhao , Gaigai Tang , Lingfeng Shen , Guanghui Wang
{"title":"基于自适应路由的情绪识别预注意语音信号处理","authors":"Huiyun Zhang , Zilong Pang , Puyang Zhao , Gaigai Tang , Lingfeng Shen , Guanghui Wang","doi":"10.1016/j.bspc.2025.108782","DOIUrl":null,"url":null,"abstract":"<div><div>Emotion recognition from speech is essential for various applications in human–computer interaction, customer service, healthcare, and entertainment. However, developing robust and reproducible Speech emotion recognition (SER) systems is challenging due to the complexity of emotions and variability in speech signals. In this paper, we first define the concept of reproducibility in the context of deep learning models. We then introduce SpeechNet, a novel deep learning model designed to enhance reproducibility and robustness in SER. SpeechNet integrates multiple advanced components: speech recall, speech attention, and speech signal refinement modules to effectively capture temporal dependencies and emotional cues in speech signal. Additionally, it incorporates a pre-attention mechanism and a modified routing technique to improve feature emphasis and processing efficiency. We also explore effective acoustic feature fusion technique. Extensive experiments on several benchmark datasets demonstrate that the SpeechNet model achieves better performance and reproducibility compared to existing models. By addressing reproducibility and robustness, SpeechNet sets a new standard in SER, facilitating reliable and practical applications.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"112 ","pages":"Article 108782"},"PeriodicalIF":4.9000,"publicationDate":"2025-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Pre-attentive speech signal processing with adaptive routing for emotion recognition\",\"authors\":\"Huiyun Zhang , Zilong Pang , Puyang Zhao , Gaigai Tang , Lingfeng Shen , Guanghui Wang\",\"doi\":\"10.1016/j.bspc.2025.108782\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Emotion recognition from speech is essential for various applications in human–computer interaction, customer service, healthcare, and entertainment. However, developing robust and reproducible Speech emotion recognition (SER) systems is challenging due to the complexity of emotions and variability in speech signals. In this paper, we first define the concept of reproducibility in the context of deep learning models. We then introduce SpeechNet, a novel deep learning model designed to enhance reproducibility and robustness in SER. SpeechNet integrates multiple advanced components: speech recall, speech attention, and speech signal refinement modules to effectively capture temporal dependencies and emotional cues in speech signal. Additionally, it incorporates a pre-attention mechanism and a modified routing technique to improve feature emphasis and processing efficiency. We also explore effective acoustic feature fusion technique. Extensive experiments on several benchmark datasets demonstrate that the SpeechNet model achieves better performance and reproducibility compared to existing models. By addressing reproducibility and robustness, SpeechNet sets a new standard in SER, facilitating reliable and practical applications.</div></div>\",\"PeriodicalId\":55362,\"journal\":{\"name\":\"Biomedical Signal Processing and Control\",\"volume\":\"112 \",\"pages\":\"Article 108782\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomedical Signal Processing and Control\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1746809425012935\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Signal Processing and Control","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1746809425012935","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
Pre-attentive speech signal processing with adaptive routing for emotion recognition
Emotion recognition from speech is essential for various applications in human–computer interaction, customer service, healthcare, and entertainment. However, developing robust and reproducible Speech emotion recognition (SER) systems is challenging due to the complexity of emotions and variability in speech signals. In this paper, we first define the concept of reproducibility in the context of deep learning models. We then introduce SpeechNet, a novel deep learning model designed to enhance reproducibility and robustness in SER. SpeechNet integrates multiple advanced components: speech recall, speech attention, and speech signal refinement modules to effectively capture temporal dependencies and emotional cues in speech signal. Additionally, it incorporates a pre-attention mechanism and a modified routing technique to improve feature emphasis and processing efficiency. We also explore effective acoustic feature fusion technique. Extensive experiments on several benchmark datasets demonstrate that the SpeechNet model achieves better performance and reproducibility compared to existing models. By addressing reproducibility and robustness, SpeechNet sets a new standard in SER, facilitating reliable and practical applications.
期刊介绍:
Biomedical Signal Processing and Control aims to provide a cross-disciplinary international forum for the interchange of information on research in the measurement and analysis of signals and images in clinical medicine and the biological sciences. Emphasis is placed on contributions dealing with the practical, applications-led research on the use of methods and devices in clinical diagnosis, patient monitoring and management.
Biomedical Signal Processing and Control reflects the main areas in which these methods are being used and developed at the interface of both engineering and clinical science. The scope of the journal is defined to include relevant review papers, technical notes, short communications and letters. Tutorial papers and special issues will also be published.