Zhiping Zeng, Haihua Xu, Tze Yuang Chong, Chng Eng Siong, Haizhou Li
{"title":"改进的N-gram语言建模用于代码转换语音识别","authors":"Zhiping Zeng, Haihua Xu, Tze Yuang Chong, Chng Eng Siong, Haizhou Li","doi":"10.1109/APSIPA.2017.8282279","DOIUrl":null,"url":null,"abstract":"Code-switching language modeling is challenging due to statistics of each individual language, as well as statistics of cross-lingual language are insufficient. To compensate for the issue of statistical insufficiency, in this paper we propose a word-class n-gram language modeling approach of which only infrequent words are clustered while most frequent words are treated as singleton classes themselves. We first demonstrate the effectiveness of the proposed method on our English-Mandarin code-switching SEAME data in terms of perplexity. Compared with the conventional word n-gram language models, as well as the word-class n-gram language models of which entire vocabulary words are clustered, the proposed word-class n- gram language modeling approach can yield lower perplexity on our SEAME dev data sets. Additionally, we observed further perplexity reduction by interpolating the word n-gram language models with the proposed word-class n-gram language models. We also attempted to build word-class n-gram language models using third-party text data with our proposed method, and similar perplexity performance improvement was obtained on our SEAME dev data sets when they are interpolated with the word n-gram language models. Finally, to examine the contribution of the proposed language modeling approach to code-switching speech recognition, we conducted lattice based n-best rescoring.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Improving N-gram language modeling for code-switching speech recognition\",\"authors\":\"Zhiping Zeng, Haihua Xu, Tze Yuang Chong, Chng Eng Siong, Haizhou Li\",\"doi\":\"10.1109/APSIPA.2017.8282279\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Code-switching language modeling is challenging due to statistics of each individual language, as well as statistics of cross-lingual language are insufficient. To compensate for the issue of statistical insufficiency, in this paper we propose a word-class n-gram language modeling approach of which only infrequent words are clustered while most frequent words are treated as singleton classes themselves. We first demonstrate the effectiveness of the proposed method on our English-Mandarin code-switching SEAME data in terms of perplexity. Compared with the conventional word n-gram language models, as well as the word-class n-gram language models of which entire vocabulary words are clustered, the proposed word-class n- gram language modeling approach can yield lower perplexity on our SEAME dev data sets. Additionally, we observed further perplexity reduction by interpolating the word n-gram language models with the proposed word-class n-gram language models. We also attempted to build word-class n-gram language models using third-party text data with our proposed method, and similar perplexity performance improvement was obtained on our SEAME dev data sets when they are interpolated with the word n-gram language models. Finally, to examine the contribution of the proposed language modeling approach to code-switching speech recognition, we conducted lattice based n-best rescoring.\",\"PeriodicalId\":142091,\"journal\":{\"name\":\"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSIPA.2017.8282279\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPA.2017.8282279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving N-gram language modeling for code-switching speech recognition
Code-switching language modeling is challenging due to statistics of each individual language, as well as statistics of cross-lingual language are insufficient. To compensate for the issue of statistical insufficiency, in this paper we propose a word-class n-gram language modeling approach of which only infrequent words are clustered while most frequent words are treated as singleton classes themselves. We first demonstrate the effectiveness of the proposed method on our English-Mandarin code-switching SEAME data in terms of perplexity. Compared with the conventional word n-gram language models, as well as the word-class n-gram language models of which entire vocabulary words are clustered, the proposed word-class n- gram language modeling approach can yield lower perplexity on our SEAME dev data sets. Additionally, we observed further perplexity reduction by interpolating the word n-gram language models with the proposed word-class n-gram language models. We also attempted to build word-class n-gram language models using third-party text data with our proposed method, and similar perplexity performance improvement was obtained on our SEAME dev data sets when they are interpolated with the word n-gram language models. Finally, to examine the contribution of the proposed language modeling approach to code-switching speech recognition, we conducted lattice based n-best rescoring.