基于类人感知注意力的单耳语音增强改进编码器-解码器结构

IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Hao Zhou;Yi Zhou;Zhenhua Cheng;Yu Zhao;Yin Liu
{"title":"基于类人感知注意力的单耳语音增强改进编码器-解码器结构","authors":"Hao Zhou;Yi Zhou;Zhenhua Cheng;Yu Zhao;Yin Liu","doi":"10.1109/LSP.2025.3558690","DOIUrl":null,"url":null,"abstract":"Speech enhancement (SE) models based on deep neural networks (DNNs) have shown excellent denoising performance. However, mainstream SE models often have high structural complexity and large parameter sizes, requiring substantial computational resources, which limits their practical application. In this paper, a high-efficiency encoder-decoder structure, inspired by the top-down attention mechanism in human brain perception and named human-like perception attention network (HPANet), is proposed for monaural speech enhancement, which is able to emulate brain perceptual attention in noise environments. In HPANet, the raw waveform is first encoded by using attention encoder to capture shallow global features. These features are then downsampled, and multi-scale information is aggregated through top attention module to prevent the loss of crucial information. Next, down attention module integrates features from neighboring layers to reconstruct signal in a top-down manner. Finally, the decoder reconstructs the denoised clean signal. Experiments show that the proposed method effectively reduces model complexity while maintaining competitive performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1670-1674"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improved Encoder-Decoder Architecture With Human-Like Perception Attention for Monaural Speech Enhancement\",\"authors\":\"Hao Zhou;Yi Zhou;Zhenhua Cheng;Yu Zhao;Yin Liu\",\"doi\":\"10.1109/LSP.2025.3558690\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech enhancement (SE) models based on deep neural networks (DNNs) have shown excellent denoising performance. However, mainstream SE models often have high structural complexity and large parameter sizes, requiring substantial computational resources, which limits their practical application. In this paper, a high-efficiency encoder-decoder structure, inspired by the top-down attention mechanism in human brain perception and named human-like perception attention network (HPANet), is proposed for monaural speech enhancement, which is able to emulate brain perceptual attention in noise environments. In HPANet, the raw waveform is first encoded by using attention encoder to capture shallow global features. These features are then downsampled, and multi-scale information is aggregated through top attention module to prevent the loss of crucial information. Next, down attention module integrates features from neighboring layers to reconstruct signal in a top-down manner. Finally, the decoder reconstructs the denoised clean signal. Experiments show that the proposed method effectively reduces model complexity while maintaining competitive performance.\",\"PeriodicalId\":13154,\"journal\":{\"name\":\"IEEE Signal Processing Letters\",\"volume\":\"32 \",\"pages\":\"1670-1674\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Signal Processing Letters\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10955229/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10955229/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

基于深度神经网络(dnn)的语音增强(SE)模型显示出优异的去噪性能。然而,主流SE模型往往具有较高的结构复杂性和较大的参数尺寸,需要大量的计算资源,这限制了它们的实际应用。本文以人类大脑感知中自上而下的注意机制为灵感,提出了一种高效的编解码器结构——类人感知注意网络(human-仿人感知注意网络,HPANet),用于单耳语音增强,能够模拟噪声环境下大脑的感知注意。在HPANet中,首先使用注意力编码器对原始波形进行编码,以捕获浅层全局特征。然后对这些特征进行下采样,并通过顶部关注模块对多尺度信息进行聚合,以防止关键信息的丢失。接下来,下注意模块整合相邻层的特征,自上而下重构信号。最后,解码器重建去噪后的干净信号。实验表明,该方法在保持竞争性能的同时,有效地降低了模型复杂度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Improved Encoder-Decoder Architecture With Human-Like Perception Attention for Monaural Speech Enhancement
Speech enhancement (SE) models based on deep neural networks (DNNs) have shown excellent denoising performance. However, mainstream SE models often have high structural complexity and large parameter sizes, requiring substantial computational resources, which limits their practical application. In this paper, a high-efficiency encoder-decoder structure, inspired by the top-down attention mechanism in human brain perception and named human-like perception attention network (HPANet), is proposed for monaural speech enhancement, which is able to emulate brain perceptual attention in noise environments. In HPANet, the raw waveform is first encoded by using attention encoder to capture shallow global features. These features are then downsampled, and multi-scale information is aggregated through top attention module to prevent the loss of crucial information. Next, down attention module integrates features from neighboring layers to reconstruct signal in a top-down manner. Finally, the decoder reconstructs the denoised clean signal. Experiments show that the proposed method effectively reduces model complexity while maintaining competitive performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Signal Processing Letters
IEEE Signal Processing Letters 工程技术-工程:电子与电气
CiteScore
7.40
自引率
12.80%
发文量
339
审稿时长
2.8 months
期刊介绍: The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信