{"title":"相位谱恢复增强激光麦克风捕获的低质量语音","authors":"Chang Liu, Yang Ai, Zhenhua Ling","doi":"10.1109/ISCSLP49672.2021.9362112","DOIUrl":null,"url":null,"abstract":"This paper proposes a phase spectrum recovery method for enhancing the low-quality speech captured by laser micro-phones, which is degraded by non-additive distortions during signal acquisition. Our preliminary study shows that common speech enhancement methods based on amplitude spectrum estimation can not achieve a satisfactory performance on this task. Therefore, this paper designs a speech enhancement model which is comprised of an amplitude spectrum estimator (ASE) and a phase spectrum estimator (PSE). The ASE adopts autoregressive LSTMs and multi-target learning framework to predict clean amplitude spectra from noisy ones. The PSE first adopts a waveform-based model to enhance noisy speech in time domain, and then extracts phase spectra from the enhanced waveforms. Subsequently, the outputs of the two estimators are combined to reconstruct the final enhanced speech waveforms. Our experimental results demonstrate that our proposed method can achieve higher PESQ score than the method using only ASE and the waveform-based speech enhancement methods, including UNet and TCNN.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Phase Spectrum Recovery for Enhancing Low-Quality Speech Captured by Laser Microphones\",\"authors\":\"Chang Liu, Yang Ai, Zhenhua Ling\",\"doi\":\"10.1109/ISCSLP49672.2021.9362112\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a phase spectrum recovery method for enhancing the low-quality speech captured by laser micro-phones, which is degraded by non-additive distortions during signal acquisition. Our preliminary study shows that common speech enhancement methods based on amplitude spectrum estimation can not achieve a satisfactory performance on this task. Therefore, this paper designs a speech enhancement model which is comprised of an amplitude spectrum estimator (ASE) and a phase spectrum estimator (PSE). The ASE adopts autoregressive LSTMs and multi-target learning framework to predict clean amplitude spectra from noisy ones. The PSE first adopts a waveform-based model to enhance noisy speech in time domain, and then extracts phase spectra from the enhanced waveforms. Subsequently, the outputs of the two estimators are combined to reconstruct the final enhanced speech waveforms. Our experimental results demonstrate that our proposed method can achieve higher PESQ score than the method using only ASE and the waveform-based speech enhancement methods, including UNet and TCNN.\",\"PeriodicalId\":279828,\"journal\":{\"name\":\"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCSLP49672.2021.9362112\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP49672.2021.9362112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Phase Spectrum Recovery for Enhancing Low-Quality Speech Captured by Laser Microphones
This paper proposes a phase spectrum recovery method for enhancing the low-quality speech captured by laser micro-phones, which is degraded by non-additive distortions during signal acquisition. Our preliminary study shows that common speech enhancement methods based on amplitude spectrum estimation can not achieve a satisfactory performance on this task. Therefore, this paper designs a speech enhancement model which is comprised of an amplitude spectrum estimator (ASE) and a phase spectrum estimator (PSE). The ASE adopts autoregressive LSTMs and multi-target learning framework to predict clean amplitude spectra from noisy ones. The PSE first adopts a waveform-based model to enhance noisy speech in time domain, and then extracts phase spectra from the enhanced waveforms. Subsequently, the outputs of the two estimators are combined to reconstruct the final enhanced speech waveforms. Our experimental results demonstrate that our proposed method can achieve higher PESQ score than the method using only ASE and the waveform-based speech enhancement methods, including UNet and TCNN.