基于音节和基于音素的DNN-HMM在日语语音识别中的比较

2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA) Pub Date : 2014-08-01 DOI:10.1109/ICAICTA.2014.7005949

Hiroshi Seki, Kazumasa Yamamoto, S. Nakagawa

{"title":"基于音节和基于音素的DNN-HMM在日语语音识别中的比较","authors":"Hiroshi Seki, Kazumasa Yamamoto, S. Nakagawa","doi":"10.1109/ICAICTA.2014.7005949","DOIUrl":null,"url":null,"abstract":"Japanese is syllabic language. Additionally we have studied syllable-based GMM-HMM for Japanese speech recognition. In this paper, we investigate the differences of recognition accuracy using phoneme/syllable-based GMM-HMM and DNN (Deep Neural Network)-HMM. First, we present a comparison of syllable-based and phoneme-based DNN-HMM. Second, we train the tied state left-context dependent syllable DNN-HMM, and compare these three types of modeling method. In the experiment, we obtained a 5% relative gain for WER using left-context syllable DNN-HMM in comparison with a left-context syllable GMM-HMM, and an 11% relative gain for WER using triphone DNN-HMM in comparison with a syllable-based DNN-HMM. Finally, we got results that modeling left-context phoneme has not worked and context independent syllable-based DNN-HMM got the best performance in the experiments, when applied to the ASJ+JNAS corpus, which consists of about 70 hours.","PeriodicalId":173600,"journal":{"name":"2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Comparison of syllable-based and phoneme-based DNN-HMM in Japanese speech recognition\",\"authors\":\"Hiroshi Seki, Kazumasa Yamamoto, S. Nakagawa\",\"doi\":\"10.1109/ICAICTA.2014.7005949\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Japanese is syllabic language. Additionally we have studied syllable-based GMM-HMM for Japanese speech recognition. In this paper, we investigate the differences of recognition accuracy using phoneme/syllable-based GMM-HMM and DNN (Deep Neural Network)-HMM. First, we present a comparison of syllable-based and phoneme-based DNN-HMM. Second, we train the tied state left-context dependent syllable DNN-HMM, and compare these three types of modeling method. In the experiment, we obtained a 5% relative gain for WER using left-context syllable DNN-HMM in comparison with a left-context syllable GMM-HMM, and an 11% relative gain for WER using triphone DNN-HMM in comparison with a syllable-based DNN-HMM. Finally, we got results that modeling left-context phoneme has not worked and context independent syllable-based DNN-HMM got the best performance in the experiments, when applied to the ASJ+JNAS corpus, which consists of about 70 hours.\",\"PeriodicalId\":173600,\"journal\":{\"name\":\"2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAICTA.2014.7005949\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICTA.2014.7005949","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

日语是音节语言。此外，我们还研究了基于音节的GMM-HMM日语语音识别。本文研究了基于音素/音节的GMM-HMM和深度神经网络(DNN)-HMM在识别精度上的差异。首先，我们比较了基于音节和基于音素的DNN-HMM。其次，我们训练了绑定状态左上下文相关音节DNN-HMM，并比较了这三种建模方法。在实验中，我们使用左上下文音节DNN-HMM与左上下文音节GMM-HMM相比，获得了5%的相对增益，使用三音DNN-HMM与基于音节的DNN-HMM相比，获得了11%的相对增益。最后，我们得到了左上下文音素建模不成功的实验结果，基于上下文无关音节的DNN-HMM在ASJ+JNAS语料库中得到了最好的表现，该语料库包含约70小时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparison of syllable-based and phoneme-based DNN-HMM in Japanese speech recognition

Japanese is syllabic language. Additionally we have studied syllable-based GMM-HMM for Japanese speech recognition. In this paper, we investigate the differences of recognition accuracy using phoneme/syllable-based GMM-HMM and DNN (Deep Neural Network)-HMM. First, we present a comparison of syllable-based and phoneme-based DNN-HMM. Second, we train the tied state left-context dependent syllable DNN-HMM, and compare these three types of modeling method. In the experiment, we obtained a 5% relative gain for WER using left-context syllable DNN-HMM in comparison with a left-context syllable GMM-HMM, and an 11% relative gain for WER using triphone DNN-HMM in comparison with a syllable-based DNN-HMM. Finally, we got results that modeling left-context phoneme has not worked and context independent syllable-based DNN-HMM got the best performance in the experiments, when applied to the ASJ+JNAS corpus, which consists of about 70 hours.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA)

自引率

0.00%

发文量