Pitch estimation from speech using Grating Compression Transform on Modified Group-Delay-gram

2015 Twenty First National Conference on Communications (NCC) Pub Date : 2015-02-01 DOI:10.1109/NCC.2015.7084899

J. Sebastian, M. Kumar, H. Murthy

{"title":"Pitch estimation from speech using Grating Compression Transform on Modified Group-Delay-gram","authors":"J. Sebastian, M. Kumar, H. Murthy","doi":"10.1109/NCC.2015.7084899","DOIUrl":null,"url":null,"abstract":"This work presents an approach for pitch extraction based on Grating Compression Transform (GCT) on harmonically-enhanced Modified Group-Delay-gram (Modgdgram). The work explores the use of peakedness and high resolution properties of the group delay functions and the ability of GCT to smear harmonically related components in the spectrum and to track pitch across frames. The power spectrum of the signal is divided by a cepstrally smoothened version of the power spectrum to obtain flattened spectrum. Owing to the picket-fence harmonics due to pitch in the flattened spectrum, the spectrum resembles a sinusoid that is corrupted by noise. This signal is treated as a sinusoidal signal and modified group delay based analysis is performed. Localized time-frequency regions of Modgdgram are used for GCT computation. Peak picking is performed on the resulting rate-scale domain and pitch dynamics are used to finalize the pitch values. The proposed algorithm without any post processing is compared with the traditional GCT computed on the magnitude spectrum and the modified group delay alone. Both natural and synthetic speech are considered for evaluation and an overall improvement of 27% is obtained in the error measures. Finally, two commonly used advanced algorithms which include post processing steps are also considered and the results obtained are comparable.","PeriodicalId":302718,"journal":{"name":"2015 Twenty First National Conference on Communications (NCC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Twenty First National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2015.7084899","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

This work presents an approach for pitch extraction based on Grating Compression Transform (GCT) on harmonically-enhanced Modified Group-Delay-gram (Modgdgram). The work explores the use of peakedness and high resolution properties of the group delay functions and the ability of GCT to smear harmonically related components in the spectrum and to track pitch across frames. The power spectrum of the signal is divided by a cepstrally smoothened version of the power spectrum to obtain flattened spectrum. Owing to the picket-fence harmonics due to pitch in the flattened spectrum, the spectrum resembles a sinusoid that is corrupted by noise. This signal is treated as a sinusoidal signal and modified group delay based analysis is performed. Localized time-frequency regions of Modgdgram are used for GCT computation. Peak picking is performed on the resulting rate-scale domain and pitch dynamics are used to finalize the pitch values. The proposed algorithm without any post processing is compared with the traditional GCT computed on the magnitude spectrum and the modified group delay alone. Both natural and synthetic speech are considered for evaluation and an overall improvement of 27% is obtained in the error measures. Finally, two commonly used advanced algorithms which include post processing steps are also considered and the results obtained are comparable.

查看原文本刊更多论文

基于改进群延迟图的光栅压缩变换的语音基音估计

提出了一种基于谐波增强修正群延迟图(Modgdgram)的光栅压缩变换(GCT)的基音提取方法。这项工作探索了群延迟函数的峰性和高分辨率特性的使用，以及GCT在频谱中涂抹谐波相关分量和跨帧跟踪间距的能力。信号的功率谱除以功率谱的倒频谱平滑版本，得到平坦的频谱。由于在平坦的频谱中由于音调而产生的尖桩栅栏谐波，频谱类似于被噪声破坏的正弦波。将该信号作为正弦信号处理，并进行基于修正群延迟的分析。利用模态图的局部时频区域进行GCT计算。峰值拾取是执行在所得的率尺度域和基音动态是用来确定基音值。将该算法与仅考虑星等谱和改进群延迟的传统GCT算法进行了比较。同时考虑了自然语音和合成语音的评价，误差测量总体提高了27%。最后，还考虑了两种常用的高级算法，其中包括后处理步骤，所得结果具有可比性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 Twenty First National Conference on Communications (NCC)

自引率

0.00%

发文量