语音语音识别:在平行电话识别器中使用不同的声学模型

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI:10.1109/ISCSLP.2012.6423509

C. Leung, B. Ma, Haizhou Li

{"title":"语音语音识别:在平行电话识别器中使用不同的声学模型","authors":"C. Leung, B. Ma, Haizhou Li","doi":"10.1109/ISCSLP.2012.6423509","DOIUrl":null,"url":null,"abstract":"In phonotactic spoken language recognition systems, acoustic model adaptation prior to phone lattice decoding has been adopted to deal with the mismatch between training and test conditions. Moreover, combining diversified phonotactic features is commonly used. These motivate us to have an in-depth investigation of combining diversified phonotactic features from diversely adapted acoustic models. Our experiment shows that our approach achieves an equal error rate (EER) of 1.94% in the 30-second closed-set trials of the 2007 NIST Language Recognition Evaluation (LRE). It represents a 14.9% relative improvement in EER over a sophisticated system, in which parallel phone recognizers, speaker adaptive training (SAT) in acoustic models and CMLLR adaptation are used. Moreover, it is shown that our approach provides consistent and substantial improvements in three different phonotactic systems, in each of which a single phone recognizer is used.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Phonotactic spoken language recognition: Using diversely adapted acoustic models in parallel phone recognizers\",\"authors\":\"C. Leung, B. Ma, Haizhou Li\",\"doi\":\"10.1109/ISCSLP.2012.6423509\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In phonotactic spoken language recognition systems, acoustic model adaptation prior to phone lattice decoding has been adopted to deal with the mismatch between training and test conditions. Moreover, combining diversified phonotactic features is commonly used. These motivate us to have an in-depth investigation of combining diversified phonotactic features from diversely adapted acoustic models. Our experiment shows that our approach achieves an equal error rate (EER) of 1.94% in the 30-second closed-set trials of the 2007 NIST Language Recognition Evaluation (LRE). It represents a 14.9% relative improvement in EER over a sophisticated system, in which parallel phone recognizers, speaker adaptive training (SAT) in acoustic models and CMLLR adaptation are used. Moreover, it is shown that our approach provides consistent and substantial improvements in three different phonotactic systems, in each of which a single phone recognizer is used.\",\"PeriodicalId\":186099,\"journal\":{\"name\":\"2012 8th International Symposium on Chinese Spoken Language Processing\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 8th International Symposium on Chinese Spoken Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCSLP.2012.6423509\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 8th International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP.2012.6423509","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在语音语音识别系统中，为了解决训练条件与测试条件不匹配的问题，声学模型自适应优先于手机格解码。此外，常用的是结合多种音致化特征。这促使我们深入研究如何从不同的声学模型中结合不同的音致性特征。我们的实验表明，在2007年NIST语言识别评估(LRE)的30秒封闭集试验中，我们的方法实现了1.94%的相等错误率(EER)。与使用并行电话识别器、声学模型中的扬声器自适应训练(SAT)和cmlr自适应的复杂系统相比，该系统的EER相对提高了14.9%。此外，研究表明，我们的方法在三种不同的语音识别系统中提供了一致和实质性的改进，其中每种系统使用单个电话识别器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Phonotactic spoken language recognition: Using diversely adapted acoustic models in parallel phone recognizers

In phonotactic spoken language recognition systems, acoustic model adaptation prior to phone lattice decoding has been adopted to deal with the mismatch between training and test conditions. Moreover, combining diversified phonotactic features is commonly used. These motivate us to have an in-depth investigation of combining diversified phonotactic features from diversely adapted acoustic models. Our experiment shows that our approach achieves an equal error rate (EER) of 1.94% in the 30-second closed-set trials of the 2007 NIST Language Recognition Evaluation (LRE). It represents a 14.9% relative improvement in EER over a sophisticated system, in which parallel phone recognizers, speaker adaptive training (SAT) in acoustic models and CMLLR adaptation are used. Moreover, it is shown that our approach provides consistent and substantial improvements in three different phonotactic systems, in each of which a single phone recognizer is used.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 8th International Symposium on Chinese Spoken Language Processing

自引率

0.00%

发文量