D. Govind, S. Priya, S. Akarsh, B. G. Gowri, K. Soman
{"title":"Improved Epoch Extraction From Speech Signals Using Wavelet Synchrosqueezed Transform","authors":"D. Govind, S. Priya, S. Akarsh, B. G. Gowri, K. Soman","doi":"10.1109/NCC.2019.8732259","DOIUrl":null,"url":null,"abstract":"Synchrosqueezed wavelet transform (WSST) is an effective tool in tracking instantaneous frequency of a given signal. The objective of the present work is to propose a WSST based method for accurate epoch estimation from speech. Epochs in speech represent the instants where the excitation to the vocaltract is maximum and instantaneous $F_{0}$ contour is derived from epoch locations. The proposed hypothesis in this paper is that the signal reconstructed by discarding higher frequency modes (above the mean $F_{0}$) in the WSST transformed time frequency domain observed to predominantly represent source characteristics. The presence of the source characteristics in the modified WSST reconstructed signal is validated by the improved identification accuracy obtained for the epochs estimated from clean speech utterances of CMU-Arctic database. To further demonstrate the effectiveness of the WSST in improving the overall epoch estimation performance, a WSST modified zero frequency filtering (ZFF) of speech, which is one of the simple and effective tools for epoch extraction, is proposed. The sharp instantaneous frequency representation by WSST also found to be effective in estimating epochs emotion utterances where rapid pitch variations are present. The improved epoch estimation performance from emotive utterances are confirmed by validating on the German emotion speech corpus(EmoDb).","PeriodicalId":6870,"journal":{"name":"2019 National Conference on Communications (NCC)","volume":"57 1","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2019.8732259","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Synchrosqueezed wavelet transform (WSST) is an effective tool in tracking instantaneous frequency of a given signal. The objective of the present work is to propose a WSST based method for accurate epoch estimation from speech. Epochs in speech represent the instants where the excitation to the vocaltract is maximum and instantaneous $F_{0}$ contour is derived from epoch locations. The proposed hypothesis in this paper is that the signal reconstructed by discarding higher frequency modes (above the mean $F_{0}$) in the WSST transformed time frequency domain observed to predominantly represent source characteristics. The presence of the source characteristics in the modified WSST reconstructed signal is validated by the improved identification accuracy obtained for the epochs estimated from clean speech utterances of CMU-Arctic database. To further demonstrate the effectiveness of the WSST in improving the overall epoch estimation performance, a WSST modified zero frequency filtering (ZFF) of speech, which is one of the simple and effective tools for epoch extraction, is proposed. The sharp instantaneous frequency representation by WSST also found to be effective in estimating epochs emotion utterances where rapid pitch variations are present. The improved epoch estimation performance from emotive utterances are confirmed by validating on the German emotion speech corpus(EmoDb).