On The Use of Discrete Cosine Transform Polarity Spectrum in Speech Enhancement

2020 28th European Signal Processing Conference (EUSIPCO) Pub Date : 2021-01-24 DOI:10.23919/Eusipco47968.2020.9287832

Sisi Shi, Andrew Busch, K. Paliwal, T. Fickenscher

{"title":"On The Use of Discrete Cosine Transform Polarity Spectrum in Speech Enhancement","authors":"Sisi Shi, Andrew Busch, K. Paliwal, T. Fickenscher","doi":"10.23919/Eusipco47968.2020.9287832","DOIUrl":null,"url":null,"abstract":"This paper investigates the use of short-time Discrete Cosine Transform (DCT) for speech enhancement. We denote the absolute values and signs of the DCT spectral coefficients as the Absolute Spectrum (AS) and Polarity Spectrum (PoS), respectively. We theoretically show that the noisy PoS is the best estimate of the original, under the constrained MMSE criterion. To verify this experimentally, the effect of using the noisy PoS for signal resynthesis is analysed through objective and subjective measures. The results show that when the Instantaneous SNR (ISNR) is above 0 dB, deemed as perfect, recovery of the original speech signal can be obtained only by modifying the DCT absolute spectrum. However, an accurate DFT Phase Spectrum (PhS) estimation might be required to achieve the same improvement in perceived speech quality. When the perceived quality is measured against the Segmental SNR (SSNR), it shows the PoS is more capable to conserve the speech quality than the PhS for the same level of global distortion. The results show that the noisy PoS can be used as an estimate of the clean PoS without perceivable degradation in speech quality, only if the ISNR of the noisy speech signal is above 0 dB or the SSNR is above 10.5 dB.","PeriodicalId":6705,"journal":{"name":"2020 28th European Signal Processing Conference (EUSIPCO)","volume":"122 1","pages":"421-425"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 28th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/Eusipco47968.2020.9287832","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

This paper investigates the use of short-time Discrete Cosine Transform (DCT) for speech enhancement. We denote the absolute values and signs of the DCT spectral coefficients as the Absolute Spectrum (AS) and Polarity Spectrum (PoS), respectively. We theoretically show that the noisy PoS is the best estimate of the original, under the constrained MMSE criterion. To verify this experimentally, the effect of using the noisy PoS for signal resynthesis is analysed through objective and subjective measures. The results show that when the Instantaneous SNR (ISNR) is above 0 dB, deemed as perfect, recovery of the original speech signal can be obtained only by modifying the DCT absolute spectrum. However, an accurate DFT Phase Spectrum (PhS) estimation might be required to achieve the same improvement in perceived speech quality. When the perceived quality is measured against the Segmental SNR (SSNR), it shows the PoS is more capable to conserve the speech quality than the PhS for the same level of global distortion. The results show that the noisy PoS can be used as an estimate of the clean PoS without perceivable degradation in speech quality, only if the ISNR of the noisy speech signal is above 0 dB or the SSNR is above 10.5 dB.

查看原文本刊更多论文

离散余弦变换极性谱在语音增强中的应用

本文研究了短时离散余弦变换(DCT)在语音增强中的应用。我们将DCT谱系数的绝对值和符号分别表示为绝对谱(absolute Spectrum, as)和极性谱(Polarity Spectrum, PoS)。我们从理论上证明了在约束MMSE准则下，带噪声的PoS是原始PoS的最佳估计。为了实验验证这一点，通过客观和主观测量分析了使用带噪声PoS进行信号重合成的效果。结果表明，当瞬时信噪比(ISNR)大于0 dB时，仅通过修改DCT绝对频谱即可获得原始语音信号的恢复。然而，精确的DFT相位谱(ph)估计可能需要达到同样的改善感知语音质量。当感知质量相对于片段信噪比(SSNR)进行测量时，它表明在相同的全局失真水平下，PoS比PhS更能保持语音质量。结果表明，当含噪语音信号的ISNR大于0 dB或SSNR大于10.5 dB时，含噪语音信号可以作为纯净语音信号的估计，而不会导致语音质量的明显下降。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 28th European Signal Processing Conference (EUSIPCO)

自引率

0.00%

发文量