{"title":"Improved Epoch Extraction Using Variational Mode Decomposition Based Spectral Smoothing of Zero Frequency Filtered Emotive Speech Signals","authors":"D. Govind, D. Pravena, S. Ajay","doi":"10.1109/NCC.2018.8600091","DOIUrl":null,"url":null,"abstract":"The objective of the present work is to improve the epoch extraction performance from emotive speech by proposing a post processing approach to the conventional zero frequency filtering (ZFF) method using variational mode decomposition (VMD) based spectral smoothing. Due to the fast uncontrolled variations of the pitch in emotive speech signals, the reliable estimation of epochs is always challenging. In the proposed method, the spectra of the short frames of zero frequency filtered signal (ZFFS) is subjected variational mode decomposition to get component spectra in five modes. A smoothed short time spectra is then obtained by excluding the spectra from the two higher VMD modes which essentially have the high spectral variations. The modified ZFFS is then reconstructed using the sinusoidal parameters corresponding to single dominant frequency present in the smoothed spectra using VMD by parameter interpolation based sinusoidal synthesis. The resulting re-synthesized ZFFS has reduced spurious zero crossings as compared to that obtained from the conventional ZFF method for emotive speech signals. The effectiveness of the proposed VMD based spectral post processing is confirmed from the improved epoch identification rate and epoch identification accuracy across all the emotive utterances (with 7 emotions) present in German emotion speech database having simultaneous speech and electroglottographic (EGG) signal recordings. The performance of the proposed method is found to be better or comparable with the other existing ZFF based post processing methods proposed for emotive speech signals in terms of the epoch identification accuracy with respect to the corresponding reference epochs estimated from EGG signals.","PeriodicalId":121544,"journal":{"name":"2018 Twenty Fourth National Conference on Communications (NCC)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Twenty Fourth National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2018.8600091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The objective of the present work is to improve the epoch extraction performance from emotive speech by proposing a post processing approach to the conventional zero frequency filtering (ZFF) method using variational mode decomposition (VMD) based spectral smoothing. Due to the fast uncontrolled variations of the pitch in emotive speech signals, the reliable estimation of epochs is always challenging. In the proposed method, the spectra of the short frames of zero frequency filtered signal (ZFFS) is subjected variational mode decomposition to get component spectra in five modes. A smoothed short time spectra is then obtained by excluding the spectra from the two higher VMD modes which essentially have the high spectral variations. The modified ZFFS is then reconstructed using the sinusoidal parameters corresponding to single dominant frequency present in the smoothed spectra using VMD by parameter interpolation based sinusoidal synthesis. The resulting re-synthesized ZFFS has reduced spurious zero crossings as compared to that obtained from the conventional ZFF method for emotive speech signals. The effectiveness of the proposed VMD based spectral post processing is confirmed from the improved epoch identification rate and epoch identification accuracy across all the emotive utterances (with 7 emotions) present in German emotion speech database having simultaneous speech and electroglottographic (EGG) signal recordings. The performance of the proposed method is found to be better or comparable with the other existing ZFF based post processing methods proposed for emotive speech signals in terms of the epoch identification accuracy with respect to the corresponding reference epochs estimated from EGG signals.