{"title":"Expressive speech analysis for epoch extraction using zero frequency filtering approach","authors":"D. Pravena, D. Govind","doi":"10.1109/TECHSYM.2016.7872689","DOIUrl":null,"url":null,"abstract":"The present work discusses the issues of epoch extraction from expressive speech signals. Epochs represent the accurate glottal closure instants in voiced speech which in turn give the accurate instants of maximum excitation of the vocal tract. Even though, there are many existing methods for epoch extraction, which provide near perfect epoch estimation from clean or neutral speech, these methods show significant drop in the epoch extraction performance for expressive speech signals. The occurrence of uncontrolled and rapid pitch variations in expressive speech signals cause degradation in the epoch extraction performance. The objective of the present work is to improve the epoch extraction performance of the speech signals with various perceptually distinct expressions compared to neutral speech using zero frequency filtering (ZFF) approach. In order to capture the rapid and uncontrolled variations in expressive speech utterances, trend removal is performed on short segments (25 ms) of the output obtained from the cascade of three zero frequency resonators (ZFR). The epoch estimation performance of the proposed method is compared with the conventional ZFF method, existing refined ZFF method proposed for expressive speech and recently proposed zero band filtering (ZBF) approach. The effectiveness of the approach is confirmed by the improved epoch identification rate and reduced miss and false alarm rates compared with that of the existing methods.","PeriodicalId":403350,"journal":{"name":"2016 IEEE Students’ Technology Symposium (TechSym)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Students’ Technology Symposium (TechSym)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TECHSYM.2016.7872689","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
The present work discusses the issues of epoch extraction from expressive speech signals. Epochs represent the accurate glottal closure instants in voiced speech which in turn give the accurate instants of maximum excitation of the vocal tract. Even though, there are many existing methods for epoch extraction, which provide near perfect epoch estimation from clean or neutral speech, these methods show significant drop in the epoch extraction performance for expressive speech signals. The occurrence of uncontrolled and rapid pitch variations in expressive speech signals cause degradation in the epoch extraction performance. The objective of the present work is to improve the epoch extraction performance of the speech signals with various perceptually distinct expressions compared to neutral speech using zero frequency filtering (ZFF) approach. In order to capture the rapid and uncontrolled variations in expressive speech utterances, trend removal is performed on short segments (25 ms) of the output obtained from the cascade of three zero frequency resonators (ZFR). The epoch estimation performance of the proposed method is compared with the conventional ZFF method, existing refined ZFF method proposed for expressive speech and recently proposed zero band filtering (ZBF) approach. The effectiveness of the approach is confirmed by the improved epoch identification rate and reduced miss and false alarm rates compared with that of the existing methods.
本文讨论了从表达性语音信号中提取历元的问题。声门闭合的时间点代表了发声时声门关闭的准确时间点,声道关闭的时间点又给出了声道最大兴奋的准确时间点。尽管已有许多epoch提取方法可以从干净或中性语音中提供接近完美的epoch估计,但这些方法对表达性语音信号的epoch提取性能明显下降。表达性语音信号中出现不受控制的快速音高变化会导致历元提取性能下降。本研究的目的是利用零频率滤波(zero frequency filtering, ZFF)方法,提高具有各种感知上不同表达的语音信号的历元提取性能。为了捕捉表达性语音的快速和不受控制的变化,对从三个零频率谐振器(ZFR)级联获得的输出的短段(25毫秒)进行趋势去除。将该方法的历元估计性能与传统的ZFF方法、现有针对表达性语音提出的改进ZFF方法以及最近提出的零带滤波(ZBF)方法进行比较。与现有方法相比,提高了历元识别率,降低了漏报率和虚警率,验证了该方法的有效性。