利用离散余弦变换特征的语音增强技术FullSubNet+

IF 1.3 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Yu-sheng Tsao, Berlin Chen, J. Hung
{"title":"利用离散余弦变换特征的语音增强技术FullSubNet+","authors":"Yu-sheng Tsao, Berlin Chen, J. Hung","doi":"10.1109/IET-ICETA56553.2022.9971683","DOIUrl":null,"url":null,"abstract":"The highly effective deep learning-based technique FullSubNet+ employs a full-band and sub-band fusion model to fulfill the speech enhancement task. FullSubNet+ exploits the short-time magnitude spectrogram, real-and imaginary parts of the complex-valued spectrogram to learn the deep neural network that mainly comprises multi-scale time-sensitive channel attention (MulCA) modules and stacked temporal convolution network (TCN) blocks. To capture the phase information of input time-domain signals more simply, we propose using the short-time DCT-based spectrogram as an alternative for the real and imaginary spectrograms to be an input source to learn the FullSubNet+ framework. The preliminary experiments conducted with the VoiceBank-DEMAND task indicate that exploiting STDCT spectrograms in FullSubNet+ achieves higher objective speech quality and intelligibility in terms of PESQ and STOI metric scores, respectively, for the test set compared with the original FullSubNet+ arrangement. In addition, the STDCT-wise FullSubNet+ obtains a real-time factor (RTF) of 0.229, lower than 0.260, the RTF for the original FullSubNet+.","PeriodicalId":46240,"journal":{"name":"IET Networks","volume":null,"pages":null},"PeriodicalIF":1.3000,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploiting Discrete Cosine Transform Features in Speech Enhancement Technique FullSubNet+\",\"authors\":\"Yu-sheng Tsao, Berlin Chen, J. Hung\",\"doi\":\"10.1109/IET-ICETA56553.2022.9971683\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The highly effective deep learning-based technique FullSubNet+ employs a full-band and sub-band fusion model to fulfill the speech enhancement task. FullSubNet+ exploits the short-time magnitude spectrogram, real-and imaginary parts of the complex-valued spectrogram to learn the deep neural network that mainly comprises multi-scale time-sensitive channel attention (MulCA) modules and stacked temporal convolution network (TCN) blocks. To capture the phase information of input time-domain signals more simply, we propose using the short-time DCT-based spectrogram as an alternative for the real and imaginary spectrograms to be an input source to learn the FullSubNet+ framework. The preliminary experiments conducted with the VoiceBank-DEMAND task indicate that exploiting STDCT spectrograms in FullSubNet+ achieves higher objective speech quality and intelligibility in terms of PESQ and STOI metric scores, respectively, for the test set compared with the original FullSubNet+ arrangement. In addition, the STDCT-wise FullSubNet+ obtains a real-time factor (RTF) of 0.229, lower than 0.260, the RTF for the original FullSubNet+.\",\"PeriodicalId\":46240,\"journal\":{\"name\":\"IET Networks\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2022-10-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IET-ICETA56553.2022.9971683\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IET-ICETA56553.2022.9971683","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

基于深度学习的高效技术FullSubNet+采用全带和子带融合模型来完成语音增强任务。FullSubNet+利用短时幅度谱图、复值谱图的实部和虚部来学习主要由多尺度时敏信道注意(MulCA)模块和堆叠时间卷积网络(TCN)模块组成的深度神经网络。为了更简单地捕获输入时域信号的相位信息,我们建议使用基于短时dct的频谱图作为替代实谱图和虚谱图的输入源来学习FullSubNet+框架。VoiceBank-DEMAND任务的初步实验表明,与原始的FullSubNet+安排相比,在FullSubNet+中利用STDCT频谱图分别在PESQ和STOI度量分数方面获得了更高的客观语音质量和可理解性。此外,STDCT-wise FullSubNet+的实时因子RTF (real-time factor)为0.229,低于原始FullSubNet+的RTF 0.260。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Exploiting Discrete Cosine Transform Features in Speech Enhancement Technique FullSubNet+
The highly effective deep learning-based technique FullSubNet+ employs a full-band and sub-band fusion model to fulfill the speech enhancement task. FullSubNet+ exploits the short-time magnitude spectrogram, real-and imaginary parts of the complex-valued spectrogram to learn the deep neural network that mainly comprises multi-scale time-sensitive channel attention (MulCA) modules and stacked temporal convolution network (TCN) blocks. To capture the phase information of input time-domain signals more simply, we propose using the short-time DCT-based spectrogram as an alternative for the real and imaginary spectrograms to be an input source to learn the FullSubNet+ framework. The preliminary experiments conducted with the VoiceBank-DEMAND task indicate that exploiting STDCT spectrograms in FullSubNet+ achieves higher objective speech quality and intelligibility in terms of PESQ and STOI metric scores, respectively, for the test set compared with the original FullSubNet+ arrangement. In addition, the STDCT-wise FullSubNet+ obtains a real-time factor (RTF) of 0.229, lower than 0.260, the RTF for the original FullSubNet+.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IET Networks
IET Networks COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
5.00
自引率
0.00%
发文量
41
审稿时长
33 weeks
期刊介绍: IET Networks covers the fundamental developments and advancing methodologies to achieve higher performance, optimized and dependable future networks. IET Networks is particularly interested in new ideas and superior solutions to the known and arising technological development bottlenecks at all levels of networking such as topologies, protocols, routing, relaying and resource-allocation for more efficient and more reliable provision of network services. Topics include, but are not limited to: Network Architecture, Design and Planning, Network Protocol, Software, Analysis, Simulation and Experiment, Network Technologies, Applications and Services, Network Security, Operation and Management.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信