{"title":"Temporal convolutional network for speech bandwidth extension","authors":"Chundong Xu, Cheng Zhu, Xianpeng Ling, Dongwen Ying","doi":"10.23919/jcc.fa.2021-0174.202311","DOIUrl":null,"url":null,"abstract":"In the field of speech bandwidth extension, it is difficult to achieve high speech quality based on the shallow statistical model method. Although the application of deep learning has greatly improved the extended speech quality, the high model complexity makes it infeasible to run on the client. In order to tackle these issues, this paper proposes an end-to-end speech bandwidth extension method based on a temporal convolutional neural network, which greatly reduces the complexity of the model. In addition, a new time-frequency loss function is designed to enable narrowband speech to acquire a more accurate wideband mapping in the time domain and the frequency domain. The experimental results show that the reconstructed wideband speech generated by the proposed method is superior to the traditional heuristic rule based approaches and the conventional neural network methods for both subjective and objective evaluation.","PeriodicalId":9814,"journal":{"name":"China Communications","volume":"9 2","pages":"0"},"PeriodicalIF":3.1000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"China Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/jcc.fa.2021-0174.202311","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
In the field of speech bandwidth extension, it is difficult to achieve high speech quality based on the shallow statistical model method. Although the application of deep learning has greatly improved the extended speech quality, the high model complexity makes it infeasible to run on the client. In order to tackle these issues, this paper proposes an end-to-end speech bandwidth extension method based on a temporal convolutional neural network, which greatly reduces the complexity of the model. In addition, a new time-frequency loss function is designed to enable narrowband speech to acquire a more accurate wideband mapping in the time domain and the frequency domain. The experimental results show that the reconstructed wideband speech generated by the proposed method is superior to the traditional heuristic rule based approaches and the conventional neural network methods for both subjective and objective evaluation.
期刊介绍:
China Communications (ISSN 1673-5447) is an English-language monthly journal cosponsored by the China Institute of Communications (CIC) and IEEE Communications Society (IEEE ComSoc). It is aimed at readers in industry, universities, research and development organizations, and government agencies in the field of Information and Communications Technologies (ICTs) worldwide.
The journal's main objective is to promote academic exchange in the ICTs sector and publish high-quality papers to contribute to the global ICTs industry. It provides instant access to the latest articles and papers, presenting leading-edge research achievements, tutorial overviews, and descriptions of significant practical applications of technology.
China Communications has been indexed in SCIE (Science Citation Index-Expanded) since January 2007. Additionally, all articles have been available in the IEEE Xplore digital library since January 2013.