Single channel speech enhancement using time-frequency attention mechanism based nested U-net model

A. Prathipati, A.S.N. Chakravarthy
{"title":"Single channel speech enhancement using time-frequency attention mechanism based nested U-net model","authors":"A. Prathipati, A.S.N. Chakravarthy","doi":"10.1088/2631-8695/ad5e36","DOIUrl":null,"url":null,"abstract":"\n Deep-learning models have used attention mechanisms to improve the quality and intelligibility of noisy speech, demonstrating the effectiveness of attention mechanisms. We rely on either spatial or temporal-based attention mechanisms, resulting in severe information loss. In this paper, a time-frequency attention mechanism with a nested U-network (TFANUNet) is proposed for single-channel speech enhancement. By using time-frequency attention (TFA), learns the channel, frequency and time information which is more significant for speech enhancement. Basically, the proposed model is an encoder-decoder model, where each layer in the encoder and decoder is followed by a nested dense residual dilated DensNet (NDRD) based multi-scale context aggression block. NDRD involves multiple dilated convolution with different dilatation factors to explore the large receptive area at different scales simultaneously. NDRD avoids the aliasing problem in DenseNet. We integrated the TFA and NDRD blocks into the proposed model to enable refined feature set extraction without information loss and utterance-level context aggregation, respectively. The proposed TFANUNet model results outperform baselines in terms of STOI and PESQ.","PeriodicalId":505725,"journal":{"name":"Engineering Research Express","volume":"34 5","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Research Express","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/2631-8695/ad5e36","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Deep-learning models have used attention mechanisms to improve the quality and intelligibility of noisy speech, demonstrating the effectiveness of attention mechanisms. We rely on either spatial or temporal-based attention mechanisms, resulting in severe information loss. In this paper, a time-frequency attention mechanism with a nested U-network (TFANUNet) is proposed for single-channel speech enhancement. By using time-frequency attention (TFA), learns the channel, frequency and time information which is more significant for speech enhancement. Basically, the proposed model is an encoder-decoder model, where each layer in the encoder and decoder is followed by a nested dense residual dilated DensNet (NDRD) based multi-scale context aggression block. NDRD involves multiple dilated convolution with different dilatation factors to explore the large receptive area at different scales simultaneously. NDRD avoids the aliasing problem in DenseNet. We integrated the TFA and NDRD blocks into the proposed model to enable refined feature set extraction without information loss and utterance-level context aggregation, respectively. The proposed TFANUNet model results outperform baselines in terms of STOI and PESQ.
利用基于嵌套 U 网模型的时频注意机制增强单声道语音
深度学习模型利用注意力机制提高了噪声语音的质量和可懂度,证明了注意力机制的有效性。我们依赖基于空间或时间的注意力机制,结果造成了严重的信息损失。本文提出了一种具有嵌套 U 形网络(TFANUNet)的时频注意机制,用于单通道语音增强。通过使用时频注意(TFA),可以学习对语音增强更重要的信道、频率和时间信息。基本上,所提出的模型是一个编码器-解码器模型,其中编码器和解码器中的每一层后面都有一个嵌套的基于多尺度上下文侵略块的密集残差稀释 DensNet (NDRD)。NDRD 包括使用不同扩张因子的多重扩张卷积,以同时探索不同尺度的大感受区。NDRD 避免了 DenseNet 中的混叠问题。我们将 TFA 和 NDRD 模块集成到所提出的模型中,以分别实现无信息损失的精细特征集提取和语料级上下文聚合。就 STOI 和 PESQ 而言,拟议的 TFANUNet 模型结果优于基线。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信