Comparison of syllable-based and phoneme-based DNN-HMM in Japanese speech recognition

2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA) Pub Date : 2014-08-01 DOI:10.1109/ICAICTA.2014.7005949

Hiroshi Seki, Kazumasa Yamamoto, S. Nakagawa

引用次数: 11

Abstract

Japanese is syllabic language. Additionally we have studied syllable-based GMM-HMM for Japanese speech recognition. In this paper, we investigate the differences of recognition accuracy using phoneme/syllable-based GMM-HMM and DNN (Deep Neural Network)-HMM. First, we present a comparison of syllable-based and phoneme-based DNN-HMM. Second, we train the tied state left-context dependent syllable DNN-HMM, and compare these three types of modeling method. In the experiment, we obtained a 5% relative gain for WER using left-context syllable DNN-HMM in comparison with a left-context syllable GMM-HMM, and an 11% relative gain for WER using triphone DNN-HMM in comparison with a syllable-based DNN-HMM. Finally, we got results that modeling left-context phoneme has not worked and context independent syllable-based DNN-HMM got the best performance in the experiments, when applied to the ASJ+JNAS corpus, which consists of about 70 hours.

查看原文本刊更多论文

基于音节和基于音素的DNN-HMM在日语语音识别中的比较

日语是音节语言。此外，我们还研究了基于音节的GMM-HMM日语语音识别。本文研究了基于音素/音节的GMM-HMM和深度神经网络(DNN)-HMM在识别精度上的差异。首先，我们比较了基于音节和基于音素的DNN-HMM。其次，我们训练了绑定状态左上下文相关音节DNN-HMM，并比较了这三种建模方法。在实验中，我们使用左上下文音节DNN-HMM与左上下文音节GMM-HMM相比，获得了5%的相对增益，使用三音DNN-HMM与基于音节的DNN-HMM相比，获得了11%的相对增益。最后，我们得到了左上下文音素建模不成功的实验结果，基于上下文无关音节的DNN-HMM在ASJ+JNAS语料库中得到了最好的表现，该语料库包含约70小时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 International Conference of Advanced Informatics: Concept, Theory and Application (ICAICTA)

自引率

0.00%

发文量