具有序列级Kullback-Leibler散度的无格最大互信息声学模型研究

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2017-12-01 DOI:10.1109/ASRU.2017.8268918

Naoyuki Kanda, Yusuke Fujita, Kenji Nagamatsu

{"title":"具有序列级Kullback-Leibler散度的无格最大互信息声学模型研究","authors":"Naoyuki Kanda, Yusuke Fujita, Kenji Nagamatsu","doi":"10.1109/ASRU.2017.8268918","DOIUrl":null,"url":null,"abstract":"Lattice-free maximum mutual information (LFMMI) was recently proposed as a mixture of the ideas of hidden-Markov-model-based acoustic models (AMs) and connectionist-temporal-classification-based AMs. In this paper, we investigate LFMMI from various perspectives of model combination, teacher-student training, and unsupervised speaker adaptation. Especially, we thoroughly investigate the use of the “sequence-level” Kullback-Leibler divergence with its novel and simple error derivation to enhance LFMMI-based AMs. In our experiment, we used the corpus of spontaneous Japanese (CSJ). Our best AM was an ensemble of three types of time delay neural networks and one long short-term memory-based network, and it finally achieved a WER of 6.94%, which is, to the best of our knowledge, the best published result for the CSJ.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"Investigation of lattice-free maximum mutual information-based acoustic models with sequence-level Kullback-Leibler divergence\",\"authors\":\"Naoyuki Kanda, Yusuke Fujita, Kenji Nagamatsu\",\"doi\":\"10.1109/ASRU.2017.8268918\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Lattice-free maximum mutual information (LFMMI) was recently proposed as a mixture of the ideas of hidden-Markov-model-based acoustic models (AMs) and connectionist-temporal-classification-based AMs. In this paper, we investigate LFMMI from various perspectives of model combination, teacher-student training, and unsupervised speaker adaptation. Especially, we thoroughly investigate the use of the “sequence-level” Kullback-Leibler divergence with its novel and simple error derivation to enhance LFMMI-based AMs. In our experiment, we used the corpus of spontaneous Japanese (CSJ). Our best AM was an ensemble of three types of time delay neural networks and one long short-term memory-based network, and it finally achieved a WER of 6.94%, which is, to the best of our knowledge, the best published result for the CSJ.\",\"PeriodicalId\":290868,\"journal\":{\"name\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2017.8268918\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2017.8268918","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

摘要

无格最大互信息(LFMMI)是基于隐马尔可夫模型的声学模型(AMs)和基于连接主义者时间分类的AMs思想的混合。本文主要从模型组合、师生训练和无监督说话人适应等方面对LFMMI进行研究。特别是，我们深入研究了利用“序列级”Kullback-Leibler散度及其新颖和简单的误差推导来增强基于lfmmi的AMs。在实验中，我们使用了自发日语语料库(CSJ)。我们的最佳AM是三种类型的时滞神经网络和一个基于长短期记忆的网络的集合，最终实现了6.94%的WER，据我们所知，这是CSJ发表的最佳结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Investigation of lattice-free maximum mutual information-based acoustic models with sequence-level Kullback-Leibler divergence

Lattice-free maximum mutual information (LFMMI) was recently proposed as a mixture of the ideas of hidden-Markov-model-based acoustic models (AMs) and connectionist-temporal-classification-based AMs. In this paper, we investigate LFMMI from various perspectives of model combination, teacher-student training, and unsupervised speaker adaptation. Especially, we thoroughly investigate the use of the “sequence-level” Kullback-Leibler divergence with its novel and simple error derivation to enhance LFMMI-based AMs. In our experiment, we used the corpus of spontaneous Japanese (CSJ). Our best AM was an ensemble of three types of time delay neural networks and one long short-term memory-based network, and it finally achieved a WER of 6.94%, which is, to the best of our knowledge, the best published result for the CSJ.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

自引率

0.00%

发文量