Trends and challenges in language modeling for speech recognition and machine translation

Holger Schwenk
{"title":"Trends and challenges in language modeling for speech recognition and machine translation","authors":"Holger Schwenk","doi":"10.1109/ASRU.2009.5373531","DOIUrl":null,"url":null,"abstract":"Language models play an important role in large vocabulary continuous speech recognition (LVCSR) systems and statistical approaches to machine translation (SMT), in particular when modeling morphologically rich languages. Despite intensive research over more than 20 years, state-of-the-art LVCSR and SMT systems seem to use only one dominant approach: n-gram back-off language models. This talk first reviews the most important approaches to language modeling. I then discuss some of the recent trends and challenges for the future. An interesting alternative to the back-off n-gram approach are the so-called continuous space methods. The basic idea is to perform the probability estimation in a continuous space. By these means better probability estimations of unseen word sequences can be expected. There is also a relative large body of works on adaptive language models. The adaptation can aim to tailor a language model to a particular task or domain, or it can be performed over time. Another very active research area are discriminative language models. Finally, I will review the challenges and benefits of language models trained an very large amounts of training material.","PeriodicalId":89617,"journal":{"name":"Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding","volume":"19 1","pages":"23"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE Workshop on Automatic Speech Recognition and Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2009.5373531","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Language models play an important role in large vocabulary continuous speech recognition (LVCSR) systems and statistical approaches to machine translation (SMT), in particular when modeling morphologically rich languages. Despite intensive research over more than 20 years, state-of-the-art LVCSR and SMT systems seem to use only one dominant approach: n-gram back-off language models. This talk first reviews the most important approaches to language modeling. I then discuss some of the recent trends and challenges for the future. An interesting alternative to the back-off n-gram approach are the so-called continuous space methods. The basic idea is to perform the probability estimation in a continuous space. By these means better probability estimations of unseen word sequences can be expected. There is also a relative large body of works on adaptive language models. The adaptation can aim to tailor a language model to a particular task or domain, or it can be performed over time. Another very active research area are discriminative language models. Finally, I will review the challenges and benefits of language models trained an very large amounts of training material.
语音识别和机器翻译语言建模的趋势和挑战
语言模型在大词汇量连续语音识别(LVCSR)系统和机器翻译(SMT)的统计方法中发挥着重要作用,特别是在对形态学丰富的语言建模时。尽管经过了20多年的深入研究,最先进的LVCSR和SMT系统似乎只使用了一种主要方法:n-gram back-off语言模型。本演讲首先回顾了语言建模的最重要的方法。然后,我讨论了一些最近的趋势和未来的挑战。一个有趣的替代n-gram方法是所谓的连续空间方法。其基本思想是在连续空间中进行概率估计。通过这些方法,可以期望对未见过的单词序列进行更好的概率估计。关于适应性语言模型也有相当多的著作。适应的目的可以是为特定的任务或领域量身定制语言模型,也可以随着时间的推移而执行。另一个非常活跃的研究领域是判别语言模型。最后,我将回顾大量训练材料训练的语言模型的挑战和好处。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信