{"title":"VTLN Through Frequency Warping Based on Pitch","authors":"C. Lopes, F. Perdigão","doi":"10.14209/JCIS.2003.10","DOIUrl":null,"url":null,"abstract":"This article describes a Vocal Tract Length Nor malization (VTLN) procedure through frequency warping based on pitch estimates. This procedure aims to reduce the inter-speaker variability of speech signals in order to obtain a robust automatic speech recognition system. Two additional methods are also described: one for reducing the environment variability and another for compensating the coarticulation effects on connected word pronunciation. En vironment variability is compensated by explicitly modeling some frequent noise phenomena. Coarticulation phenomena compensation reduces speech signal variability by modeling events that result from coarticulation between adjacent mod els. Inter-speaker variability removal is performed by a traditional speaker normalization method, which consists in expanding or compressing the Mel filterbank bandwidths, in order to normalize the Vocal Tract Length (VTL) of each speaker. Most of the existing methods for VTL estimation are based on formant estimation, but the difficulty of formant estimation is a known performance limitation. The proposed method over comes such a problem since it estimates the warping factor through pitch. The recognition results, obtained for a tele phone digit recognition task (with phones and sub words as units), prove that this procedure leads to similar improve ments to those obtained with traditional methods based on formant estimates, actually outperforming them in some sit uations.","PeriodicalId":310988,"journal":{"name":"Anais do 2002 International Telecommunications Symposium","volume":"137 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais do 2002 International Telecommunications Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14209/JCIS.2003.10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
This article describes a Vocal Tract Length Nor malization (VTLN) procedure through frequency warping based on pitch estimates. This procedure aims to reduce the inter-speaker variability of speech signals in order to obtain a robust automatic speech recognition system. Two additional methods are also described: one for reducing the environment variability and another for compensating the coarticulation effects on connected word pronunciation. En vironment variability is compensated by explicitly modeling some frequent noise phenomena. Coarticulation phenomena compensation reduces speech signal variability by modeling events that result from coarticulation between adjacent mod els. Inter-speaker variability removal is performed by a traditional speaker normalization method, which consists in expanding or compressing the Mel filterbank bandwidths, in order to normalize the Vocal Tract Length (VTL) of each speaker. Most of the existing methods for VTL estimation are based on formant estimation, but the difficulty of formant estimation is a known performance limitation. The proposed method over comes such a problem since it estimates the warping factor through pitch. The recognition results, obtained for a tele phone digit recognition task (with phones and sub words as units), prove that this procedure leads to similar improve ments to those obtained with traditional methods based on formant estimates, actually outperforming them in some sit uations.