{"title":"多状态驳船模型的判别训练","authors":"A. Ljolje, Vincent Goffin","doi":"10.1109/ASRU.2007.4430137","DOIUrl":null,"url":null,"abstract":"A barge-in system designed to reflect the design of the acoustic model used in commercial applications has been built and evaluated. It uses standard hidden Markov model structures, cepstral features and multiple hidden Markov models for both the speech and non-speech parts of the model. It is tested on a large number of real-world databases using noisy speech onset positions which were determined by forced alignment of lexical transcriptions with the recognition model. The ML trained model achieves low false rejection rates at the expense of high false acceptance rates. The discriminative training using the modified algorithm based on the maximum mutual information criterion reduces the false acceptance rates by a half, while preserving the low false rejection rates. Combining an energy based voice activity detector with the hidden Markov model based barge-in models achieves the best performance.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Discriminative training of multi-state barge-in models\",\"authors\":\"A. Ljolje, Vincent Goffin\",\"doi\":\"10.1109/ASRU.2007.4430137\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A barge-in system designed to reflect the design of the acoustic model used in commercial applications has been built and evaluated. It uses standard hidden Markov model structures, cepstral features and multiple hidden Markov models for both the speech and non-speech parts of the model. It is tested on a large number of real-world databases using noisy speech onset positions which were determined by forced alignment of lexical transcriptions with the recognition model. The ML trained model achieves low false rejection rates at the expense of high false acceptance rates. The discriminative training using the modified algorithm based on the maximum mutual information criterion reduces the false acceptance rates by a half, while preserving the low false rejection rates. Combining an energy based voice activity detector with the hidden Markov model based barge-in models achieves the best performance.\",\"PeriodicalId\":371729,\"journal\":{\"name\":\"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2007.4430137\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2007.4430137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Discriminative training of multi-state barge-in models
A barge-in system designed to reflect the design of the acoustic model used in commercial applications has been built and evaluated. It uses standard hidden Markov model structures, cepstral features and multiple hidden Markov models for both the speech and non-speech parts of the model. It is tested on a large number of real-world databases using noisy speech onset positions which were determined by forced alignment of lexical transcriptions with the recognition model. The ML trained model achieves low false rejection rates at the expense of high false acceptance rates. The discriminative training using the modified algorithm based on the maximum mutual information criterion reduces the false acceptance rates by a half, while preserving the low false rejection rates. Combining an energy based voice activity detector with the hidden Markov model based barge-in models achieves the best performance.