Advances in Acoustic Modeling for Vietnamese LVCSR

2009 International Conference on Asian Language Processing Pub Date : 2009-12-07 DOI:10.1109/IALP.2009.66

Tuan-Nam Nguyen, Q. Vu

引用次数: 11

Abstract

In this paper, we present our experiments on the selection of basic phonetic units for the Vietnamese large vocabulary continuous speech recognition (LVCSR). Two acoustic models were compared. The first model has just used vowels or monophthongs as phonemes [2] while the second one, which was proposed in this paper, has explored the use of diphthongs and triphthongs as phonemes as well. The two models were trained and evaluated on a Broadcast News corpus containing 27 hours of acoustic training data and 1 hour of acoustic testing data. Moreover, an 146M-word corpus collection of newspaper was employed for building the language models. Experimental results indicate significant improvements in both word accuracy rate and time-execution. With the second acoustic model, the word accuracy rates reach 86.06% on the best case and the execution time is faster than the real-time.

查看原文本刊更多论文

越南LVCSR声学模拟研究进展

本文对越南语大词汇量连续语音识别中基本语音单位的选择进行了实验研究。比较了两种声学模型。第一种模式仅使用元音或单元音作为音素[2]，而本文提出的第二种模式则探索了双元音和三元音作为音素的使用。这两个模型在包含27小时声学训练数据和1小时声学测试数据的Broadcast News语料库上进行了训练和评估。此外，还使用了一个1.46亿字的报纸语料库集合来构建语言模型。实验结果表明，该方法在单词正确率和时间执行方面都有显著提高。使用第二种声学模型，在最佳情况下，单词准确率达到86.06%，执行时间比实时更快。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 International Conference on Asian Language Processing

自引率

0.00%

发文量