{"title":"Exploiting Discrete Cosine Transform Features in Speech Enhancement Technique FullSubNet+","authors":"Yu-sheng Tsao, Berlin Chen, J. Hung","doi":"10.1109/IET-ICETA56553.2022.9971683","DOIUrl":null,"url":null,"abstract":"The highly effective deep learning-based technique FullSubNet+ employs a full-band and sub-band fusion model to fulfill the speech enhancement task. FullSubNet+ exploits the short-time magnitude spectrogram, real-and imaginary parts of the complex-valued spectrogram to learn the deep neural network that mainly comprises multi-scale time-sensitive channel attention (MulCA) modules and stacked temporal convolution network (TCN) blocks. To capture the phase information of input time-domain signals more simply, we propose using the short-time DCT-based spectrogram as an alternative for the real and imaginary spectrograms to be an input source to learn the FullSubNet+ framework. The preliminary experiments conducted with the VoiceBank-DEMAND task indicate that exploiting STDCT spectrograms in FullSubNet+ achieves higher objective speech quality and intelligibility in terms of PESQ and STOI metric scores, respectively, for the test set compared with the original FullSubNet+ arrangement. In addition, the STDCT-wise FullSubNet+ obtains a real-time factor (RTF) of 0.229, lower than 0.260, the RTF for the original FullSubNet+.","PeriodicalId":46240,"journal":{"name":"IET Networks","volume":null,"pages":null},"PeriodicalIF":1.3000,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IET-ICETA56553.2022.9971683","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The highly effective deep learning-based technique FullSubNet+ employs a full-band and sub-band fusion model to fulfill the speech enhancement task. FullSubNet+ exploits the short-time magnitude spectrogram, real-and imaginary parts of the complex-valued spectrogram to learn the deep neural network that mainly comprises multi-scale time-sensitive channel attention (MulCA) modules and stacked temporal convolution network (TCN) blocks. To capture the phase information of input time-domain signals more simply, we propose using the short-time DCT-based spectrogram as an alternative for the real and imaginary spectrograms to be an input source to learn the FullSubNet+ framework. The preliminary experiments conducted with the VoiceBank-DEMAND task indicate that exploiting STDCT spectrograms in FullSubNet+ achieves higher objective speech quality and intelligibility in terms of PESQ and STOI metric scores, respectively, for the test set compared with the original FullSubNet+ arrangement. In addition, the STDCT-wise FullSubNet+ obtains a real-time factor (RTF) of 0.229, lower than 0.260, the RTF for the original FullSubNet+.
IET NetworksCOMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
5.00
自引率
0.00%
发文量
41
审稿时长
33 weeks
期刊介绍:
IET Networks covers the fundamental developments and advancing methodologies to achieve higher performance, optimized and dependable future networks. IET Networks is particularly interested in new ideas and superior solutions to the known and arising technological development bottlenecks at all levels of networking such as topologies, protocols, routing, relaying and resource-allocation for more efficient and more reliable provision of network services. Topics include, but are not limited to: Network Architecture, Design and Planning, Network Protocol, Software, Analysis, Simulation and Experiment, Network Technologies, Applications and Services, Network Security, Operation and Management.