Improving N-gram language modeling for code-switching speech recognition

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2017-12-01 DOI:10.1109/APSIPA.2017.8282279

Zhiping Zeng, Haihua Xu, Tze Yuang Chong, Chng Eng Siong, Haizhou Li

{"title":"Improving N-gram language modeling for code-switching speech recognition","authors":"Zhiping Zeng, Haihua Xu, Tze Yuang Chong, Chng Eng Siong, Haizhou Li","doi":"10.1109/APSIPA.2017.8282279","DOIUrl":null,"url":null,"abstract":"Code-switching language modeling is challenging due to statistics of each individual language, as well as statistics of cross-lingual language are insufficient. To compensate for the issue of statistical insufficiency, in this paper we propose a word-class n-gram language modeling approach of which only infrequent words are clustered while most frequent words are treated as singleton classes themselves. We first demonstrate the effectiveness of the proposed method on our English-Mandarin code-switching SEAME data in terms of perplexity. Compared with the conventional word n-gram language models, as well as the word-class n-gram language models of which entire vocabulary words are clustered, the proposed word-class n- gram language modeling approach can yield lower perplexity on our SEAME dev data sets. Additionally, we observed further perplexity reduction by interpolating the word n-gram language models with the proposed word-class n-gram language models. We also attempted to build word-class n-gram language models using third-party text data with our proposed method, and similar perplexity performance improvement was obtained on our SEAME dev data sets when they are interpolated with the word n-gram language models. Finally, to examine the contribution of the proposed language modeling approach to code-switching speech recognition, we conducted lattice based n-best rescoring.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPA.2017.8282279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Code-switching language modeling is challenging due to statistics of each individual language, as well as statistics of cross-lingual language are insufficient. To compensate for the issue of statistical insufficiency, in this paper we propose a word-class n-gram language modeling approach of which only infrequent words are clustered while most frequent words are treated as singleton classes themselves. We first demonstrate the effectiveness of the proposed method on our English-Mandarin code-switching SEAME data in terms of perplexity. Compared with the conventional word n-gram language models, as well as the word-class n-gram language models of which entire vocabulary words are clustered, the proposed word-class n- gram language modeling approach can yield lower perplexity on our SEAME dev data sets. Additionally, we observed further perplexity reduction by interpolating the word n-gram language models with the proposed word-class n-gram language models. We also attempted to build word-class n-gram language models using third-party text data with our proposed method, and similar perplexity performance improvement was obtained on our SEAME dev data sets when they are interpolated with the word n-gram language models. Finally, to examine the contribution of the proposed language modeling approach to code-switching speech recognition, we conducted lattice based n-best rescoring.

查看原文本刊更多论文

改进的N-gram语言建模用于代码转换语音识别

由于对每种语言的统计以及跨语言的统计不足，代码转换语言建模具有挑战性。为了弥补统计不足的问题，本文提出了一种词类n-gram语言建模方法，其中只有不频繁的词被聚类，而最频繁的词被视为单类。我们首先从困惑度的角度证明了该方法在英汉语码切换SEAME数据上的有效性。与传统的词n-gram语言模型以及整个词汇词聚类的词类n-gram语言模型相比，本文提出的词类n-gram语言建模方法在我们的SEAME开发数据集上产生了更低的困惑度。此外，我们观察到通过将单词n-gram语言模型与提出的词类n-gram语言模型内插，进一步降低了困惑度。我们还尝试使用我们提出的方法使用第三方文本数据构建词类n-gram语言模型，并且当我们的SEAME开发数据集与词n-gram语言模型进行插值时，它们也获得了类似的困惑性能改进。最后，为了检验所提出的语言建模方法对代码切换语音识别的贡献，我们进行了基于格的n-best评分。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

自引率

0.00%

发文量