{"title":"Analysis of singing voice for epoch extraction using Zero Frequency Filtering method","authors":"Sudarsana Reddy Kadiri, B. Yegnanarayana","doi":"10.1109/ICASSP.2015.7178774","DOIUrl":null,"url":null,"abstract":"Epoch is the instant of significant excitation of the vocal tract system during the production of voiced speech. Estimation of epochs or Glottal closure instants (GCIs) is a well studied topic in the speech analysis. From the recent studies on GCI detection from singing voice with state-of-art methods proposed for speech, there exist a clear gap in accuracy between speech and singing voice. This is because of source-filter interaction in singing voice compared to speech. Performance of existing algorithms deteriorates as most of the techniques depends on the ability to model the vocal tract system in order to emphasize the excitation characteristics in the residual. The objective of this paper is to analyze the singing voice for the estimation of epochs by studying the characteristics of the source-filter interaction and the effect of wider range of pitch using the Zero Frequency Filtering (ZFF) method. It is observed that high source-filter interaction can be captured in the form of the impulse-like excitation by passing the signal through three ideal digital resonators having poles at zero frequency, and the effect of wider range of pitch can be controlled by processing short segment (0.4-0.5 sec) signal.","PeriodicalId":117666,"journal":{"name":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2015.7178774","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25
Abstract
Epoch is the instant of significant excitation of the vocal tract system during the production of voiced speech. Estimation of epochs or Glottal closure instants (GCIs) is a well studied topic in the speech analysis. From the recent studies on GCI detection from singing voice with state-of-art methods proposed for speech, there exist a clear gap in accuracy between speech and singing voice. This is because of source-filter interaction in singing voice compared to speech. Performance of existing algorithms deteriorates as most of the techniques depends on the ability to model the vocal tract system in order to emphasize the excitation characteristics in the residual. The objective of this paper is to analyze the singing voice for the estimation of epochs by studying the characteristics of the source-filter interaction and the effect of wider range of pitch using the Zero Frequency Filtering (ZFF) method. It is observed that high source-filter interaction can be captured in the form of the impulse-like excitation by passing the signal through three ideal digital resonators having poles at zero frequency, and the effect of wider range of pitch can be controlled by processing short segment (0.4-0.5 sec) signal.
Epoch是发声过程中声道系统显著兴奋的时刻。声门关闭时间的估计是语音分析中一个被广泛研究的课题。从近年来对语音GCI检测的研究来看,语音与歌声在准确率上存在明显的差距。这是因为与语音相比,歌声中的源-滤波器相互作用。现有算法的性能下降,因为大多数技术依赖于对声道系统建模的能力,以强调残差中的激励特征。本文的目的是利用零频率滤波(Zero Frequency Filtering, ZFF)方法,通过研究源-滤波器相互作用的特性和更宽音高范围的影响,对歌唱声音进行epoch估计分析。结果表明,通过三个极点为零频率的理想数字谐振器,可以捕获到高的源-滤波器相互作用,以类脉冲激励的形式,并且可以通过处理短段(0.4-0.5秒)信号来控制更宽的基音范围。