Scaling shrinkage-based language models

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI:10.1109/ASRU.2009.5373380

Stanley F. Chen, L. Mangu, B. Ramabhadran, R. Sarikaya, A. Sethy

{"title":"Scaling shrinkage-based language models","authors":"Stanley F. Chen, L. Mangu, B. Ramabhadran, R. Sarikaya, A. Sethy","doi":"10.1109/ASRU.2009.5373380","DOIUrl":null,"url":null,"abstract":"In [1], we show that a novel class-based language model, Model M, and the method of regularized minimum discrimination information (rMDI) models outperform comparable methods on moderate amounts of Wall Street Journal data. Both of these methods are motivated by the observation that shrinking the sum of parameter magnitudes in an exponential language model tends to improve performance [2]. In this paper, we investigate whether these shrinkage-based techniques also perform well on larger training sets and on other domains. First, we explain why good performance on large data sets is uncertain, by showing that gains relative to a baseline n-gram model tend to decrease as training set size increases. Next, we evaluate several methods for data/model combination with Model M and rMDI models on limited-scale domains, to uncover which techniques should work best on large domains. Finally, we apply these methods on a variety of medium-to-large-scale domains covering several languages, and show that Model M consistently provides significant gains over existing language models for state-of-the-art systems in both speech recognition and machine translation.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"41","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2009.5373380","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 41

Abstract

In [1], we show that a novel class-based language model, Model M, and the method of regularized minimum discrimination information (rMDI) models outperform comparable methods on moderate amounts of Wall Street Journal data. Both of these methods are motivated by the observation that shrinking the sum of parameter magnitudes in an exponential language model tends to improve performance [2]. In this paper, we investigate whether these shrinkage-based techniques also perform well on larger training sets and on other domains. First, we explain why good performance on large data sets is uncertain, by showing that gains relative to a baseline n-gram model tend to decrease as training set size increases. Next, we evaluate several methods for data/model combination with Model M and rMDI models on limited-scale domains, to uncover which techniques should work best on large domains. Finally, we apply these methods on a variety of medium-to-large-scale domains covering several languages, and show that Model M consistently provides significant gains over existing language models for state-of-the-art systems in both speech recognition and machine translation.

查看原文本刊更多论文

缩放基于收缩的语言模型

在[1]中，我们展示了一种新的基于类的语言模型，模型M和正则化最小区别信息(rMDI)模型方法在中等数量的华尔街日报数据上优于可比方法。这两种方法的动机都是观察到在指数语言模型中缩小参数大小的总和倾向于提高性能。在本文中，我们研究了这些基于收缩的技术是否在更大的训练集和其他领域也表现良好。首先，我们解释了为什么大型数据集上的良好性能是不确定的，通过显示相对于基线n-gram模型的增益倾向于随着训练集大小的增加而减少。接下来，我们评估了几种在有限尺度域中与模型M和rMDI模型进行数据/模型组合的方法，以揭示哪些技术在大范围域中应该最有效。最后，我们将这些方法应用于涵盖多种语言的各种中大型领域，并表明在语音识别和机器翻译方面，模型M始终为最先进的系统提供比现有语言模型更大的收益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 IEEE Workshop on Automatic Speech Recognition & Understanding

自引率

0.00%

发文量