DeConformer-SENet:高效的可变形保形语音增强网络

IF 2.9 3区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Man Li, Ya Liu, Li Zhou
{"title":"DeConformer-SENet:高效的可变形保形语音增强网络","authors":"Man Li,&nbsp;Ya Liu,&nbsp;Li Zhou","doi":"10.1016/j.dsp.2024.104787","DOIUrl":null,"url":null,"abstract":"<div><div>The Conformer model has demonstrated superior performance in speech enhancement by combining the long-range relationship modeling capability of self-attention with the local information processing ability of convolutional neural networks (CNNs). However, existing Conformer-based speech enhancement models struggle to balance performance and model complexity. In this work, we propose, DeConformer-SENet, an end-to-end time-domain deformable Conformer speech enhancement model, with modifications to both the self-attention and CNN components. Firstly, we introduce the time-frequency-channel self-attention (TFC-SA) module, which compresses information from each dimension of the input features into a one-dimensional vector. By calculating the energy distribution, this module models long-range relationships across three dimensions, reducing computational complexity while maintaining performance. Additionally, we replace standard convolutions with deformable convolutions, aiming to expand the receptive field of the CNN and accurately model local features. We validate our proposed DeConformer-SENet on the WSJ0-SI84 + DNS Challenge dataset. Experimental results demonstrate that DeConformer-SENet outperforms existing Conformer and Transformer models in terms of ESTOI and PESQ metrics, while also being more computationally efficient. Furthermore, ablation studies confirm that DeConformer-SENet improvements enhance the performance of conventional Conformer and reduce model complexity without compromising the overall effectiveness.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"156 ","pages":"Article 104787"},"PeriodicalIF":2.9000,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DeConformer-SENet: An efficient deformable conformer speech enhancement network\",\"authors\":\"Man Li,&nbsp;Ya Liu,&nbsp;Li Zhou\",\"doi\":\"10.1016/j.dsp.2024.104787\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The Conformer model has demonstrated superior performance in speech enhancement by combining the long-range relationship modeling capability of self-attention with the local information processing ability of convolutional neural networks (CNNs). However, existing Conformer-based speech enhancement models struggle to balance performance and model complexity. In this work, we propose, DeConformer-SENet, an end-to-end time-domain deformable Conformer speech enhancement model, with modifications to both the self-attention and CNN components. Firstly, we introduce the time-frequency-channel self-attention (TFC-SA) module, which compresses information from each dimension of the input features into a one-dimensional vector. By calculating the energy distribution, this module models long-range relationships across three dimensions, reducing computational complexity while maintaining performance. Additionally, we replace standard convolutions with deformable convolutions, aiming to expand the receptive field of the CNN and accurately model local features. We validate our proposed DeConformer-SENet on the WSJ0-SI84 + DNS Challenge dataset. Experimental results demonstrate that DeConformer-SENet outperforms existing Conformer and Transformer models in terms of ESTOI and PESQ metrics, while also being more computationally efficient. Furthermore, ablation studies confirm that DeConformer-SENet improvements enhance the performance of conventional Conformer and reduce model complexity without compromising the overall effectiveness.</div></div>\",\"PeriodicalId\":51011,\"journal\":{\"name\":\"Digital Signal Processing\",\"volume\":\"156 \",\"pages\":\"Article 104787\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Signal Processing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1051200424004123\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1051200424004123","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

Conformer 模型将自我注意的长程关系建模能力与卷积神经网络(CNN)的局部信息处理能力相结合,在语音增强方面表现出卓越的性能。然而,现有的基于 Conformer 的语音增强模型很难在性能和模型复杂度之间取得平衡。在这项工作中,我们提出了端到端时域可变形 Conformer 语音增强模型 DeConformer-SENet,并对自注意和 CNN 部分进行了修改。首先,我们引入了时频信道自注意(TFC-SA)模块,它将输入特征的每个维度的信息压缩为一维向量。通过计算能量分布,该模块对三个维度的长程关系进行建模,从而在保持性能的同时降低了计算复杂度。此外,我们还用可变形卷积取代了标准卷积,旨在扩大 CNN 的感受野,并对局部特征进行精确建模。我们在 WSJ0-SI84 + DNS Challenge 数据集上验证了我们提出的 DeConformer-SENet。实验结果表明,DeConformer-SENet 在 ESTOI 和 PESQ 指标方面优于现有的 Conformer 和 Transformer 模型,同时计算效率也更高。此外,消融研究证实,DeConformer-SENet 的改进提高了传统 Conformer 的性能,降低了模型的复杂性,同时不影响整体效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
DeConformer-SENet: An efficient deformable conformer speech enhancement network
The Conformer model has demonstrated superior performance in speech enhancement by combining the long-range relationship modeling capability of self-attention with the local information processing ability of convolutional neural networks (CNNs). However, existing Conformer-based speech enhancement models struggle to balance performance and model complexity. In this work, we propose, DeConformer-SENet, an end-to-end time-domain deformable Conformer speech enhancement model, with modifications to both the self-attention and CNN components. Firstly, we introduce the time-frequency-channel self-attention (TFC-SA) module, which compresses information from each dimension of the input features into a one-dimensional vector. By calculating the energy distribution, this module models long-range relationships across three dimensions, reducing computational complexity while maintaining performance. Additionally, we replace standard convolutions with deformable convolutions, aiming to expand the receptive field of the CNN and accurately model local features. We validate our proposed DeConformer-SENet on the WSJ0-SI84 + DNS Challenge dataset. Experimental results demonstrate that DeConformer-SENet outperforms existing Conformer and Transformer models in terms of ESTOI and PESQ metrics, while also being more computationally efficient. Furthermore, ablation studies confirm that DeConformer-SENet improvements enhance the performance of conventional Conformer and reduce model complexity without compromising the overall effectiveness.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Digital Signal Processing
Digital Signal Processing 工程技术-工程:电子与电气
CiteScore
5.30
自引率
17.20%
发文量
435
审稿时长
66 days
期刊介绍: Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal. The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as: • big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信