Improved Epoch Extraction From Speech Signals Using Wavelet Synchrosqueezed Transform

D. Govind, S. Priya, S. Akarsh, B. G. Gowri, K. Soman
{"title":"Improved Epoch Extraction From Speech Signals Using Wavelet Synchrosqueezed Transform","authors":"D. Govind, S. Priya, S. Akarsh, B. G. Gowri, K. Soman","doi":"10.1109/NCC.2019.8732259","DOIUrl":null,"url":null,"abstract":"Synchrosqueezed wavelet transform (WSST) is an effective tool in tracking instantaneous frequency of a given signal. The objective of the present work is to propose a WSST based method for accurate epoch estimation from speech. Epochs in speech represent the instants where the excitation to the vocaltract is maximum and instantaneous $F_{0}$ contour is derived from epoch locations. The proposed hypothesis in this paper is that the signal reconstructed by discarding higher frequency modes (above the mean $F_{0}$) in the WSST transformed time frequency domain observed to predominantly represent source characteristics. The presence of the source characteristics in the modified WSST reconstructed signal is validated by the improved identification accuracy obtained for the epochs estimated from clean speech utterances of CMU-Arctic database. To further demonstrate the effectiveness of the WSST in improving the overall epoch estimation performance, a WSST modified zero frequency filtering (ZFF) of speech, which is one of the simple and effective tools for epoch extraction, is proposed. The sharp instantaneous frequency representation by WSST also found to be effective in estimating epochs emotion utterances where rapid pitch variations are present. The improved epoch estimation performance from emotive utterances are confirmed by validating on the German emotion speech corpus(EmoDb).","PeriodicalId":6870,"journal":{"name":"2019 National Conference on Communications (NCC)","volume":"57 1","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2019.8732259","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Synchrosqueezed wavelet transform (WSST) is an effective tool in tracking instantaneous frequency of a given signal. The objective of the present work is to propose a WSST based method for accurate epoch estimation from speech. Epochs in speech represent the instants where the excitation to the vocaltract is maximum and instantaneous $F_{0}$ contour is derived from epoch locations. The proposed hypothesis in this paper is that the signal reconstructed by discarding higher frequency modes (above the mean $F_{0}$) in the WSST transformed time frequency domain observed to predominantly represent source characteristics. The presence of the source characteristics in the modified WSST reconstructed signal is validated by the improved identification accuracy obtained for the epochs estimated from clean speech utterances of CMU-Arctic database. To further demonstrate the effectiveness of the WSST in improving the overall epoch estimation performance, a WSST modified zero frequency filtering (ZFF) of speech, which is one of the simple and effective tools for epoch extraction, is proposed. The sharp instantaneous frequency representation by WSST also found to be effective in estimating epochs emotion utterances where rapid pitch variations are present. The improved epoch estimation performance from emotive utterances are confirmed by validating on the German emotion speech corpus(EmoDb).
基于小波同步压缩变换的改进语音信号历元提取
同步压缩小波变换(WSST)是跟踪给定信号瞬时频率的有效工具。本文的目的是提出一种基于WSST的基于语音的精确历元估计方法。语音中的epoch表示对发声道的激励最大的时刻,瞬时的$F_{0}$轮廓由epoch位置导出。本文提出的假设是,在变换后的WSST时频域中,通过丢弃较高频率模式(高于平均值$F_{0}$)重建的信号观察到主要代表源特性。改进后的WSST重构信号中存在源特征,对CMU-Arctic数据库中干净语音估计的年代识别精度得到了提高。为了进一步证明WSST在提高整体历元估计性能方面的有效性,提出了一种简单有效的语音零频率滤波(ZFF)方法。通过WSST的尖锐的瞬时频率表示也被发现在估计存在快速音高变化的情感话语时是有效的。通过对德语情感语音语料库(EmoDb)的验证,验证了改进的情绪话语历元估计性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信