Sisi Shi, Andrew Busch, K. Paliwal, T. Fickenscher
{"title":"On The Use of Discrete Cosine Transform Polarity Spectrum in Speech Enhancement","authors":"Sisi Shi, Andrew Busch, K. Paliwal, T. Fickenscher","doi":"10.23919/Eusipco47968.2020.9287832","DOIUrl":null,"url":null,"abstract":"This paper investigates the use of short-time Discrete Cosine Transform (DCT) for speech enhancement. We denote the absolute values and signs of the DCT spectral coefficients as the Absolute Spectrum (AS) and Polarity Spectrum (PoS), respectively. We theoretically show that the noisy PoS is the best estimate of the original, under the constrained MMSE criterion. To verify this experimentally, the effect of using the noisy PoS for signal resynthesis is analysed through objective and subjective measures. The results show that when the Instantaneous SNR (ISNR) is above 0 dB, deemed as perfect, recovery of the original speech signal can be obtained only by modifying the DCT absolute spectrum. However, an accurate DFT Phase Spectrum (PhS) estimation might be required to achieve the same improvement in perceived speech quality. When the perceived quality is measured against the Segmental SNR (SSNR), it shows the PoS is more capable to conserve the speech quality than the PhS for the same level of global distortion. The results show that the noisy PoS can be used as an estimate of the clean PoS without perceivable degradation in speech quality, only if the ISNR of the noisy speech signal is above 0 dB or the SSNR is above 10.5 dB.","PeriodicalId":6705,"journal":{"name":"2020 28th European Signal Processing Conference (EUSIPCO)","volume":"122 1","pages":"421-425"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 28th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/Eusipco47968.2020.9287832","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
This paper investigates the use of short-time Discrete Cosine Transform (DCT) for speech enhancement. We denote the absolute values and signs of the DCT spectral coefficients as the Absolute Spectrum (AS) and Polarity Spectrum (PoS), respectively. We theoretically show that the noisy PoS is the best estimate of the original, under the constrained MMSE criterion. To verify this experimentally, the effect of using the noisy PoS for signal resynthesis is analysed through objective and subjective measures. The results show that when the Instantaneous SNR (ISNR) is above 0 dB, deemed as perfect, recovery of the original speech signal can be obtained only by modifying the DCT absolute spectrum. However, an accurate DFT Phase Spectrum (PhS) estimation might be required to achieve the same improvement in perceived speech quality. When the perceived quality is measured against the Segmental SNR (SSNR), it shows the PoS is more capable to conserve the speech quality than the PhS for the same level of global distortion. The results show that the noisy PoS can be used as an estimate of the clean PoS without perceivable degradation in speech quality, only if the ISNR of the noisy speech signal is above 0 dB or the SSNR is above 10.5 dB.