利用离散余弦变换特征的语音增强技术FullSubNet+

IF 1.3 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

IET Networks Pub Date : 2022-10-14 DOI:10.1109/IET-ICETA56553.2022.9971683

Yu-sheng Tsao, Berlin Chen, J. Hung

{"title":"利用离散余弦变换特征的语音增强技术FullSubNet+","authors":"Yu-sheng Tsao, Berlin Chen, J. Hung","doi":"10.1109/IET-ICETA56553.2022.9971683","DOIUrl":null,"url":null,"abstract":"The highly effective deep learning-based technique FullSubNet+ employs a full-band and sub-band fusion model to fulfill the speech enhancement task. FullSubNet+ exploits the short-time magnitude spectrogram, real-and imaginary parts of the complex-valued spectrogram to learn the deep neural network that mainly comprises multi-scale time-sensitive channel attention (MulCA) modules and stacked temporal convolution network (TCN) blocks. To capture the phase information of input time-domain signals more simply, we propose using the short-time DCT-based spectrogram as an alternative for the real and imaginary spectrograms to be an input source to learn the FullSubNet+ framework. The preliminary experiments conducted with the VoiceBank-DEMAND task indicate that exploiting STDCT spectrograms in FullSubNet+ achieves higher objective speech quality and intelligibility in terms of PESQ and STOI metric scores, respectively, for the test set compared with the original FullSubNet+ arrangement. In addition, the STDCT-wise FullSubNet+ obtains a real-time factor (RTF) of 0.229, lower than 0.260, the RTF for the original FullSubNet+.","PeriodicalId":46240,"journal":{"name":"IET Networks","volume":"29 1","pages":"1-2"},"PeriodicalIF":1.3000,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploiting Discrete Cosine Transform Features in Speech Enhancement Technique FullSubNet+\",\"authors\":\"Yu-sheng Tsao, Berlin Chen, J. Hung\",\"doi\":\"10.1109/IET-ICETA56553.2022.9971683\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The highly effective deep learning-based technique FullSubNet+ employs a full-band and sub-band fusion model to fulfill the speech enhancement task. FullSubNet+ exploits the short-time magnitude spectrogram, real-and imaginary parts of the complex-valued spectrogram to learn the deep neural network that mainly comprises multi-scale time-sensitive channel attention (MulCA) modules and stacked temporal convolution network (TCN) blocks. To capture the phase information of input time-domain signals more simply, we propose using the short-time DCT-based spectrogram as an alternative for the real and imaginary spectrograms to be an input source to learn the FullSubNet+ framework. The preliminary experiments conducted with the VoiceBank-DEMAND task indicate that exploiting STDCT spectrograms in FullSubNet+ achieves higher objective speech quality and intelligibility in terms of PESQ and STOI metric scores, respectively, for the test set compared with the original FullSubNet+ arrangement. In addition, the STDCT-wise FullSubNet+ obtains a real-time factor (RTF) of 0.229, lower than 0.260, the RTF for the original FullSubNet+.\",\"PeriodicalId\":46240,\"journal\":{\"name\":\"IET Networks\",\"volume\":\"29 1\",\"pages\":\"1-2\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2022-10-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IET-ICETA56553.2022.9971683\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IET-ICETA56553.2022.9971683","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

基于深度学习的高效技术FullSubNet+采用全带和子带融合模型来完成语音增强任务。FullSubNet+利用短时幅度谱图、复值谱图的实部和虚部来学习主要由多尺度时敏信道注意(MulCA)模块和堆叠时间卷积网络(TCN)模块组成的深度神经网络。为了更简单地捕获输入时域信号的相位信息，我们建议使用基于短时dct的频谱图作为替代实谱图和虚谱图的输入源来学习FullSubNet+框架。VoiceBank-DEMAND任务的初步实验表明，与原始的FullSubNet+安排相比，在FullSubNet+中利用STDCT频谱图分别在PESQ和STOI度量分数方面获得了更高的客观语音质量和可理解性。此外，STDCT-wise FullSubNet+的实时因子RTF (real-time factor)为0.229，低于原始FullSubNet+的RTF 0.260。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploiting Discrete Cosine Transform Features in Speech Enhancement Technique FullSubNet+

The highly effective deep learning-based technique FullSubNet+ employs a full-band and sub-band fusion model to fulfill the speech enhancement task. FullSubNet+ exploits the short-time magnitude spectrogram, real-and imaginary parts of the complex-valued spectrogram to learn the deep neural network that mainly comprises multi-scale time-sensitive channel attention (MulCA) modules and stacked temporal convolution network (TCN) blocks. To capture the phase information of input time-domain signals more simply, we propose using the short-time DCT-based spectrogram as an alternative for the real and imaginary spectrograms to be an input source to learn the FullSubNet+ framework. The preliminary experiments conducted with the VoiceBank-DEMAND task indicate that exploiting STDCT spectrograms in FullSubNet+ achieves higher objective speech quality and intelligibility in terms of PESQ and STOI metric scores, respectively, for the test set compared with the original FullSubNet+ arrangement. In addition, the STDCT-wise FullSubNet+ obtains a real-time factor (RTF) of 0.229, lower than 0.260, the RTF for the original FullSubNet+.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IET Networks COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

5.00

自引率

0.00%

发文量

审稿时长

33 weeks

期刊介绍： IET Networks covers the fundamental developments and advancing methodologies to achieve higher performance, optimized and dependable future networks. IET Networks is particularly interested in new ideas and superior solutions to the known and arising technological development bottlenecks at all levels of networking such as topologies, protocols, routing, relaying and resource-allocation for more efficient and more reliable provision of network services. Topics include, but are not limited to: Network Architecture, Design and Planning, Network Protocol, Software, Analysis, Simulation and Experiment, Network Technologies, Applications and Services, Network Security, Operation and Management.