基于子空间高斯混合模型的语音识别多语言声学建模

2010 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2010-03-14 DOI:10.1109/ICASSP.2010.5495646

L. Burget, Petr Schwarz, Mohit Agarwal, Pinar Akyazi, Kai Feng, Arnab Ghoshal, O. Glembek, N. Goel, M. Karafiát, Daniel Povey, A. Rastrow, R. Rose, Samuel Thomas

{"title":"基于子空间高斯混合模型的语音识别多语言声学建模","authors":"L. Burget, Petr Schwarz, Mohit Agarwal, Pinar Akyazi, Kai Feng, Arnab Ghoshal, O. Glembek, N. Goel, M. Karafiát, Daniel Povey, A. Rastrow, R. Rose, Samuel Thomas","doi":"10.1109/ICASSP.2010.5495646","DOIUrl":null,"url":null,"abstract":"Although research has previously been done on multilingual speech recognition, it has been found to be very difficult to improve over separately trained systems. The usual approach has been to use some kind of “universal phone set” that covers multiple languages. We report experiments on a different approach to multilingual speech recognition, in which the phone sets are entirely distinct but the model has parameters not tied to specific states that are shared across languages. We use a model called a “Subspace Gaussian Mixture Model” where states' distributions are Gaussian Mixture Models with a common structure, constrained to lie in a subspace of the total parameter space. The parameters that define this subspace can be shared across languages. We obtain substantial WER improvements with this approach, especially with very small amounts of in-language training data.","PeriodicalId":293333,"journal":{"name":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"189","resultStr":"{\"title\":\"Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models\",\"authors\":\"L. Burget, Petr Schwarz, Mohit Agarwal, Pinar Akyazi, Kai Feng, Arnab Ghoshal, O. Glembek, N. Goel, M. Karafiát, Daniel Povey, A. Rastrow, R. Rose, Samuel Thomas\",\"doi\":\"10.1109/ICASSP.2010.5495646\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Although research has previously been done on multilingual speech recognition, it has been found to be very difficult to improve over separately trained systems. The usual approach has been to use some kind of “universal phone set” that covers multiple languages. We report experiments on a different approach to multilingual speech recognition, in which the phone sets are entirely distinct but the model has parameters not tied to specific states that are shared across languages. We use a model called a “Subspace Gaussian Mixture Model” where states' distributions are Gaussian Mixture Models with a common structure, constrained to lie in a subspace of the total parameter space. The parameters that define this subspace can be shared across languages. We obtain substantial WER improvements with this approach, especially with very small amounts of in-language training data.\",\"PeriodicalId\":293333,\"journal\":{\"name\":\"2010 IEEE International Conference on Acoustics, Speech and Signal Processing\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"189\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Conference on Acoustics, Speech and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2010.5495646\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Acoustics, Speech and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2010.5495646","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 189

摘要

尽管之前已经对多语言语音识别进行了研究，但人们发现，在单独训练的系统上进行改进是非常困难的。通常的方法是使用某种覆盖多种语言的“通用电话机”。我们报告了一种不同的多语言语音识别方法的实验，在这种方法中，电话机是完全不同的，但模型的参数与跨语言共享的特定状态无关。我们使用一种称为“子空间高斯混合模型”的模型，其中状态的分布是具有公共结构的高斯混合模型，约束位于总参数空间的子空间中。定义此子空间的参数可以跨语言共享。通过这种方法，我们获得了实质性的WER改进，特别是在非常少量的语言训练数据的情况下。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models

Although research has previously been done on multilingual speech recognition, it has been found to be very difficult to improve over separately trained systems. The usual approach has been to use some kind of “universal phone set” that covers multiple languages. We report experiments on a different approach to multilingual speech recognition, in which the phone sets are entirely distinct but the model has parameters not tied to specific states that are shared across languages. We use a model called a “Subspace Gaussian Mixture Model” where states' distributions are Gaussian Mixture Models with a common structure, constrained to lie in a subspace of the total parameter space. The parameters that define this subspace can be shared across languages. We obtain substantial WER improvements with this approach, especially with very small amounts of in-language training data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 IEEE International Conference on Acoustics, Speech and Signal Processing

自引率

0.00%

发文量