Pitch estimation from speech using Grating Compression Transform on Modified Group-Delay-gram

J. Sebastian, M. Kumar, H. Murthy
{"title":"Pitch estimation from speech using Grating Compression Transform on Modified Group-Delay-gram","authors":"J. Sebastian, M. Kumar, H. Murthy","doi":"10.1109/NCC.2015.7084899","DOIUrl":null,"url":null,"abstract":"This work presents an approach for pitch extraction based on Grating Compression Transform (GCT) on harmonically-enhanced Modified Group-Delay-gram (Modgdgram). The work explores the use of peakedness and high resolution properties of the group delay functions and the ability of GCT to smear harmonically related components in the spectrum and to track pitch across frames. The power spectrum of the signal is divided by a cepstrally smoothened version of the power spectrum to obtain flattened spectrum. Owing to the picket-fence harmonics due to pitch in the flattened spectrum, the spectrum resembles a sinusoid that is corrupted by noise. This signal is treated as a sinusoidal signal and modified group delay based analysis is performed. Localized time-frequency regions of Modgdgram are used for GCT computation. Peak picking is performed on the resulting rate-scale domain and pitch dynamics are used to finalize the pitch values. The proposed algorithm without any post processing is compared with the traditional GCT computed on the magnitude spectrum and the modified group delay alone. Both natural and synthetic speech are considered for evaluation and an overall improvement of 27% is obtained in the error measures. Finally, two commonly used advanced algorithms which include post processing steps are also considered and the results obtained are comparable.","PeriodicalId":302718,"journal":{"name":"2015 Twenty First National Conference on Communications (NCC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Twenty First National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2015.7084899","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

This work presents an approach for pitch extraction based on Grating Compression Transform (GCT) on harmonically-enhanced Modified Group-Delay-gram (Modgdgram). The work explores the use of peakedness and high resolution properties of the group delay functions and the ability of GCT to smear harmonically related components in the spectrum and to track pitch across frames. The power spectrum of the signal is divided by a cepstrally smoothened version of the power spectrum to obtain flattened spectrum. Owing to the picket-fence harmonics due to pitch in the flattened spectrum, the spectrum resembles a sinusoid that is corrupted by noise. This signal is treated as a sinusoidal signal and modified group delay based analysis is performed. Localized time-frequency regions of Modgdgram are used for GCT computation. Peak picking is performed on the resulting rate-scale domain and pitch dynamics are used to finalize the pitch values. The proposed algorithm without any post processing is compared with the traditional GCT computed on the magnitude spectrum and the modified group delay alone. Both natural and synthetic speech are considered for evaluation and an overall improvement of 27% is obtained in the error measures. Finally, two commonly used advanced algorithms which include post processing steps are also considered and the results obtained are comparable.
基于改进群延迟图的光栅压缩变换的语音基音估计
提出了一种基于谐波增强修正群延迟图(Modgdgram)的光栅压缩变换(GCT)的基音提取方法。这项工作探索了群延迟函数的峰性和高分辨率特性的使用,以及GCT在频谱中涂抹谐波相关分量和跨帧跟踪间距的能力。信号的功率谱除以功率谱的倒频谱平滑版本,得到平坦的频谱。由于在平坦的频谱中由于音调而产生的尖桩栅栏谐波,频谱类似于被噪声破坏的正弦波。将该信号作为正弦信号处理,并进行基于修正群延迟的分析。利用模态图的局部时频区域进行GCT计算。峰值拾取是执行在所得的率尺度域和基音动态是用来确定基音值。将该算法与仅考虑星等谱和改进群延迟的传统GCT算法进行了比较。同时考虑了自然语音和合成语音的评价,误差测量总体提高了27%。最后,还考虑了两种常用的高级算法,其中包括后处理步骤,所得结果具有可比性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信