{"title":"利用H∞优化、深度神经网络和语音产生模型的人工带宽扩展","authors":"Deepika Gupta, H. S. Shekhawat","doi":"10.1109/SPCOM55316.2022.9840805","DOIUrl":null,"url":null,"abstract":"Artificial bandwidth extension is applied to speech signals to improve their quality in narrowband telephonic communication. For accomplishing this, the missing high-frequency components of speech signals are recovered by utilizing an extrapolation process. In this context, we propose another structure wherein we apply the gain adjustment as well as the discrete Fourier transform addition for adding the narrowband signal and corresponding estimated high-band signal. The high-band signal is evaluated by using a synthesis filter, which is acquired by utilizing the $H^{\\infty}$ optimization and speech production model. Non-stationary (time-varying) characteristics of speech signals produce assorted variety in the synthesis filters. So, we use a feed-forward deep neural network (DNN) to estimate the synthesis filter information and gain factor for a given narrowband feature of the signal. Objective analysis is done on the RSR15 and TIMIT datasets. Additionally, objective analysis is performed separately for the voiced speech as well as for the unvoiced speech. Subjective evaluation is conducted on the RSR15 dataset.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Artificial Bandwidth Extension Using H∞ Optimization, Deep Neural Network, and Speech Production Model\",\"authors\":\"Deepika Gupta, H. S. Shekhawat\",\"doi\":\"10.1109/SPCOM55316.2022.9840805\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Artificial bandwidth extension is applied to speech signals to improve their quality in narrowband telephonic communication. For accomplishing this, the missing high-frequency components of speech signals are recovered by utilizing an extrapolation process. In this context, we propose another structure wherein we apply the gain adjustment as well as the discrete Fourier transform addition for adding the narrowband signal and corresponding estimated high-band signal. The high-band signal is evaluated by using a synthesis filter, which is acquired by utilizing the $H^{\\\\infty}$ optimization and speech production model. Non-stationary (time-varying) characteristics of speech signals produce assorted variety in the synthesis filters. So, we use a feed-forward deep neural network (DNN) to estimate the synthesis filter information and gain factor for a given narrowband feature of the signal. Objective analysis is done on the RSR15 and TIMIT datasets. Additionally, objective analysis is performed separately for the voiced speech as well as for the unvoiced speech. Subjective evaluation is conducted on the RSR15 dataset.\",\"PeriodicalId\":246982,\"journal\":{\"name\":\"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPCOM55316.2022.9840805\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPCOM55316.2022.9840805","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Artificial Bandwidth Extension Using H∞ Optimization, Deep Neural Network, and Speech Production Model
Artificial bandwidth extension is applied to speech signals to improve their quality in narrowband telephonic communication. For accomplishing this, the missing high-frequency components of speech signals are recovered by utilizing an extrapolation process. In this context, we propose another structure wherein we apply the gain adjustment as well as the discrete Fourier transform addition for adding the narrowband signal and corresponding estimated high-band signal. The high-band signal is evaluated by using a synthesis filter, which is acquired by utilizing the $H^{\infty}$ optimization and speech production model. Non-stationary (time-varying) characteristics of speech signals produce assorted variety in the synthesis filters. So, we use a feed-forward deep neural network (DNN) to estimate the synthesis filter information and gain factor for a given narrowband feature of the signal. Objective analysis is done on the RSR15 and TIMIT datasets. Additionally, objective analysis is performed separately for the voiced speech as well as for the unvoiced speech. Subjective evaluation is conducted on the RSR15 dataset.