{"title":"Segmental intensity and HMM modeling","authors":"P. Dumouchel, D. O'Shaughnessy","doi":"10.1109/CCECE.1995.526596","DOIUrl":null,"url":null,"abstract":"We propose to use a stochastic segmental intensity model independent of the HMM model in INRS's large vocabulary continuous speech recognizer. First, we examine how to insert this model into the search algorithm without violating the optimality constraints of this algorithm. Second, we propose and test the performance of four different intensity models. The training and testing of the models is done on a studio quality speaker-dependent speech corpus. The first model is a Gaussian mixture phone intensity model independent of the phonemic context. The second model is a Gaussian mixture phone intensity model dependent on the right or left phoneme context. The third model is a Gaussian mixture intensity model based on the variation of intensity within a diphone. Finally, the last model consists of a stochastic silence-speech detector. Performance comparisons show that the best model uses Gaussian mixture of the variation of intensity within a diphone (third model). This model improves the percentage of word recognition from 89.58% (no intensity modeling) to 90.92%.","PeriodicalId":158581,"journal":{"name":"Proceedings 1995 Canadian Conference on Electrical and Computer Engineering","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 1995 Canadian Conference on Electrical and Computer Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCECE.1995.526596","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We propose to use a stochastic segmental intensity model independent of the HMM model in INRS's large vocabulary continuous speech recognizer. First, we examine how to insert this model into the search algorithm without violating the optimality constraints of this algorithm. Second, we propose and test the performance of four different intensity models. The training and testing of the models is done on a studio quality speaker-dependent speech corpus. The first model is a Gaussian mixture phone intensity model independent of the phonemic context. The second model is a Gaussian mixture phone intensity model dependent on the right or left phoneme context. The third model is a Gaussian mixture intensity model based on the variation of intensity within a diphone. Finally, the last model consists of a stochastic silence-speech detector. Performance comparisons show that the best model uses Gaussian mixture of the variation of intensity within a diphone (third model). This model improves the percentage of word recognition from 89.58% (no intensity modeling) to 90.92%.