Syllable-Based Acoustic Modeling With Lattice-Free MMI for Mandarin Speech Recognition

Jie Li, Zhiyun Fan, Xiaorui Wang, Yan Li
{"title":"Syllable-Based Acoustic Modeling With Lattice-Free MMI for Mandarin Speech Recognition","authors":"Jie Li, Zhiyun Fan, Xiaorui Wang, Yan Li","doi":"10.1109/ISCSLP49672.2021.9362050","DOIUrl":null,"url":null,"abstract":"Most automatic speech recognition (ASR) systems in past decades have used context-dependent (CD) phones as the fundamental acoustic units. However, these phone-based approaches lack an easy and efficient way for modeling long-term temporal dependencies. Compared with phone units, syllables span a longer time, typically several phones, thereby having more stable acoustic realizations. In this work, we aim to train a syllable-based acoustic model for Mandarin ASR with lattice-free maximum mutual information (LF-MMI) criterion. We expect that, the combination of longer linguistic units, the RNN-based model structure and the sequence-level objective function, can result in better modeling of long-term temporal acoustic variations. We make multiple modifications to improve the performance of syllable-based AM and benchmark our models on two large-scale databases. Experimental results show that the proposed syllable-based AM performs much better than the CD phone-based baseline, especially on noisy test sets, with faster decoding speed.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP49672.2021.9362050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Most automatic speech recognition (ASR) systems in past decades have used context-dependent (CD) phones as the fundamental acoustic units. However, these phone-based approaches lack an easy and efficient way for modeling long-term temporal dependencies. Compared with phone units, syllables span a longer time, typically several phones, thereby having more stable acoustic realizations. In this work, we aim to train a syllable-based acoustic model for Mandarin ASR with lattice-free maximum mutual information (LF-MMI) criterion. We expect that, the combination of longer linguistic units, the RNN-based model structure and the sequence-level objective function, can result in better modeling of long-term temporal acoustic variations. We make multiple modifications to improve the performance of syllable-based AM and benchmark our models on two large-scale databases. Experimental results show that the proposed syllable-based AM performs much better than the CD phone-based baseline, especially on noisy test sets, with faster decoding speed.
基于音节声学建模的无格MMI普通话语音识别
在过去的几十年里,大多数自动语音识别(ASR)系统都使用上下文相关(CD)电话作为基本的声学单元。然而,这些基于手机的方法缺乏一种简单有效的方法来建模长期时间依赖性。与电话单位相比,音节跨越的时间更长,通常是几个电话,因此具有更稳定的声学实现。在这项工作中,我们的目标是训练一个基于音节的普通话ASR声学模型,该模型采用无格最大互信息(LF-MMI)准则。我们期望,结合较长的语言单元、基于rnn的模型结构和序列级目标函数,可以更好地建模长期时间声学变化。我们进行了多次修改以提高基于音节的AM的性能,并在两个大型数据库上对我们的模型进行了基准测试。实验结果表明,基于音节的调幅比基于CD电话基线的调幅性能好得多,特别是在噪声测试集上,译码速度更快。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信