{"title":"语音语音识别:在平行电话识别器中使用不同的声学模型","authors":"C. Leung, B. Ma, Haizhou Li","doi":"10.1109/ISCSLP.2012.6423509","DOIUrl":null,"url":null,"abstract":"In phonotactic spoken language recognition systems, acoustic model adaptation prior to phone lattice decoding has been adopted to deal with the mismatch between training and test conditions. Moreover, combining diversified phonotactic features is commonly used. These motivate us to have an in-depth investigation of combining diversified phonotactic features from diversely adapted acoustic models. Our experiment shows that our approach achieves an equal error rate (EER) of 1.94% in the 30-second closed-set trials of the 2007 NIST Language Recognition Evaluation (LRE). It represents a 14.9% relative improvement in EER over a sophisticated system, in which parallel phone recognizers, speaker adaptive training (SAT) in acoustic models and CMLLR adaptation are used. Moreover, it is shown that our approach provides consistent and substantial improvements in three different phonotactic systems, in each of which a single phone recognizer is used.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Phonotactic spoken language recognition: Using diversely adapted acoustic models in parallel phone recognizers\",\"authors\":\"C. Leung, B. Ma, Haizhou Li\",\"doi\":\"10.1109/ISCSLP.2012.6423509\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In phonotactic spoken language recognition systems, acoustic model adaptation prior to phone lattice decoding has been adopted to deal with the mismatch between training and test conditions. Moreover, combining diversified phonotactic features is commonly used. These motivate us to have an in-depth investigation of combining diversified phonotactic features from diversely adapted acoustic models. Our experiment shows that our approach achieves an equal error rate (EER) of 1.94% in the 30-second closed-set trials of the 2007 NIST Language Recognition Evaluation (LRE). It represents a 14.9% relative improvement in EER over a sophisticated system, in which parallel phone recognizers, speaker adaptive training (SAT) in acoustic models and CMLLR adaptation are used. Moreover, it is shown that our approach provides consistent and substantial improvements in three different phonotactic systems, in each of which a single phone recognizer is used.\",\"PeriodicalId\":186099,\"journal\":{\"name\":\"2012 8th International Symposium on Chinese Spoken Language Processing\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 8th International Symposium on Chinese Spoken Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCSLP.2012.6423509\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 8th International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP.2012.6423509","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Phonotactic spoken language recognition: Using diversely adapted acoustic models in parallel phone recognizers
In phonotactic spoken language recognition systems, acoustic model adaptation prior to phone lattice decoding has been adopted to deal with the mismatch between training and test conditions. Moreover, combining diversified phonotactic features is commonly used. These motivate us to have an in-depth investigation of combining diversified phonotactic features from diversely adapted acoustic models. Our experiment shows that our approach achieves an equal error rate (EER) of 1.94% in the 30-second closed-set trials of the 2007 NIST Language Recognition Evaluation (LRE). It represents a 14.9% relative improvement in EER over a sophisticated system, in which parallel phone recognizers, speaker adaptive training (SAT) in acoustic models and CMLLR adaptation are used. Moreover, it is shown that our approach provides consistent and substantial improvements in three different phonotactic systems, in each of which a single phone recognizer is used.