{"title":"具有序列级Kullback-Leibler散度的无格最大互信息声学模型研究","authors":"Naoyuki Kanda, Yusuke Fujita, Kenji Nagamatsu","doi":"10.1109/ASRU.2017.8268918","DOIUrl":null,"url":null,"abstract":"Lattice-free maximum mutual information (LFMMI) was recently proposed as a mixture of the ideas of hidden-Markov-model-based acoustic models (AMs) and connectionist-temporal-classification-based AMs. In this paper, we investigate LFMMI from various perspectives of model combination, teacher-student training, and unsupervised speaker adaptation. Especially, we thoroughly investigate the use of the “sequence-level” Kullback-Leibler divergence with its novel and simple error derivation to enhance LFMMI-based AMs. In our experiment, we used the corpus of spontaneous Japanese (CSJ). Our best AM was an ensemble of three types of time delay neural networks and one long short-term memory-based network, and it finally achieved a WER of 6.94%, which is, to the best of our knowledge, the best published result for the CSJ.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"Investigation of lattice-free maximum mutual information-based acoustic models with sequence-level Kullback-Leibler divergence\",\"authors\":\"Naoyuki Kanda, Yusuke Fujita, Kenji Nagamatsu\",\"doi\":\"10.1109/ASRU.2017.8268918\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Lattice-free maximum mutual information (LFMMI) was recently proposed as a mixture of the ideas of hidden-Markov-model-based acoustic models (AMs) and connectionist-temporal-classification-based AMs. In this paper, we investigate LFMMI from various perspectives of model combination, teacher-student training, and unsupervised speaker adaptation. Especially, we thoroughly investigate the use of the “sequence-level” Kullback-Leibler divergence with its novel and simple error derivation to enhance LFMMI-based AMs. In our experiment, we used the corpus of spontaneous Japanese (CSJ). Our best AM was an ensemble of three types of time delay neural networks and one long short-term memory-based network, and it finally achieved a WER of 6.94%, which is, to the best of our knowledge, the best published result for the CSJ.\",\"PeriodicalId\":290868,\"journal\":{\"name\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2017.8268918\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2017.8268918","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Investigation of lattice-free maximum mutual information-based acoustic models with sequence-level Kullback-Leibler divergence
Lattice-free maximum mutual information (LFMMI) was recently proposed as a mixture of the ideas of hidden-Markov-model-based acoustic models (AMs) and connectionist-temporal-classification-based AMs. In this paper, we investigate LFMMI from various perspectives of model combination, teacher-student training, and unsupervised speaker adaptation. Especially, we thoroughly investigate the use of the “sequence-level” Kullback-Leibler divergence with its novel and simple error derivation to enhance LFMMI-based AMs. In our experiment, we used the corpus of spontaneous Japanese (CSJ). Our best AM was an ensemble of three types of time delay neural networks and one long short-term memory-based network, and it finally achieved a WER of 6.94%, which is, to the best of our knowledge, the best published result for the CSJ.