Improved Epoch Extraction Using Variational Mode Decomposition Based Spectral Smoothing of Zero Frequency Filtered Emotive Speech Signals

2018 Twenty Fourth National Conference on Communications (NCC) Pub Date : 2018-02-01 DOI:10.1109/NCC.2018.8600091

D. Govind, D. Pravena, S. Ajay

{"title":"Improved Epoch Extraction Using Variational Mode Decomposition Based Spectral Smoothing of Zero Frequency Filtered Emotive Speech Signals","authors":"D. Govind, D. Pravena, S. Ajay","doi":"10.1109/NCC.2018.8600091","DOIUrl":null,"url":null,"abstract":"The objective of the present work is to improve the epoch extraction performance from emotive speech by proposing a post processing approach to the conventional zero frequency filtering (ZFF) method using variational mode decomposition (VMD) based spectral smoothing. Due to the fast uncontrolled variations of the pitch in emotive speech signals, the reliable estimation of epochs is always challenging. In the proposed method, the spectra of the short frames of zero frequency filtered signal (ZFFS) is subjected variational mode decomposition to get component spectra in five modes. A smoothed short time spectra is then obtained by excluding the spectra from the two higher VMD modes which essentially have the high spectral variations. The modified ZFFS is then reconstructed using the sinusoidal parameters corresponding to single dominant frequency present in the smoothed spectra using VMD by parameter interpolation based sinusoidal synthesis. The resulting re-synthesized ZFFS has reduced spurious zero crossings as compared to that obtained from the conventional ZFF method for emotive speech signals. The effectiveness of the proposed VMD based spectral post processing is confirmed from the improved epoch identification rate and epoch identification accuracy across all the emotive utterances (with 7 emotions) present in German emotion speech database having simultaneous speech and electroglottographic (EGG) signal recordings. The performance of the proposed method is found to be better or comparable with the other existing ZFF based post processing methods proposed for emotive speech signals in terms of the epoch identification accuracy with respect to the corresponding reference epochs estimated from EGG signals.","PeriodicalId":121544,"journal":{"name":"2018 Twenty Fourth National Conference on Communications (NCC)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Twenty Fourth National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC.2018.8600091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

The objective of the present work is to improve the epoch extraction performance from emotive speech by proposing a post processing approach to the conventional zero frequency filtering (ZFF) method using variational mode decomposition (VMD) based spectral smoothing. Due to the fast uncontrolled variations of the pitch in emotive speech signals, the reliable estimation of epochs is always challenging. In the proposed method, the spectra of the short frames of zero frequency filtered signal (ZFFS) is subjected variational mode decomposition to get component spectra in five modes. A smoothed short time spectra is then obtained by excluding the spectra from the two higher VMD modes which essentially have the high spectral variations. The modified ZFFS is then reconstructed using the sinusoidal parameters corresponding to single dominant frequency present in the smoothed spectra using VMD by parameter interpolation based sinusoidal synthesis. The resulting re-synthesized ZFFS has reduced spurious zero crossings as compared to that obtained from the conventional ZFF method for emotive speech signals. The effectiveness of the proposed VMD based spectral post processing is confirmed from the improved epoch identification rate and epoch identification accuracy across all the emotive utterances (with 7 emotions) present in German emotion speech database having simultaneous speech and electroglottographic (EGG) signal recordings. The performance of the proposed method is found to be better or comparable with the other existing ZFF based post processing methods proposed for emotive speech signals in terms of the epoch identification accuracy with respect to the corresponding reference epochs estimated from EGG signals.

查看原文本刊更多论文

基于变分模分解的零频率滤波情绪语音信号谱平滑改进历元提取

本文的目的是通过提出一种基于变分模态分解(VMD)的频谱平滑的后处理方法来改进从情绪语音中提取历元的性能。由于情绪语音信号中音高的快速不受控制的变化，可靠的时代估计一直是一个挑战。该方法对零频滤波信号的短帧谱进行变分模态分解，得到五种模态的分量谱。然后通过排除两个高VMD模式的光谱得到平滑的短时间光谱，这两个模式本质上具有高光谱变化。然后利用基于参数插值的正弦合成方法，利用VMD平滑谱中单个主频率对应的正弦参数重构改进后的ZFFS。与传统的ZFF方法获得的情感语音信号相比，由此产生的重新合成的ZFFS减少了虚假的过零。在德语情绪语音数据库中，同时记录语音和声门电信号的所有情绪话语(含7种情绪)的历元识别率和历元识别准确率均有所提高，从而证实了基于VMD的频谱后处理的有效性。与现有的基于ZFF的情感语音信号后处理方法相比，基于EGG信号估计的相应参考epoch的历元识别精度更好或相当。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 Twenty Fourth National Conference on Communications (NCC)

自引率

0.00%

发文量