多类型语音识别的增量半监督学习

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2020-05-01 DOI:10.1109/ICASSP40776.2020.9054309

B. K. Khonglah, S. Madikeri, S. Dey, H. Bourlard, P. Motlícek, J. Billa

{"title":"多类型语音识别的增量半监督学习","authors":"B. K. Khonglah, S. Madikeri, S. Dey, H. Bourlard, P. Motlícek, J. Billa","doi":"10.1109/ICASSP40776.2020.9054309","DOIUrl":null,"url":null,"abstract":"In this work, we explore a data scheduling strategy for semi-supervised learning (SSL) for acoustic modeling in automatic speech recognition. The conventional approach uses a seed model trained with supervised data to automatically recognize the entire set of unlabeled (auxiliary) data to generate new labels for subsequent acoustic model training. In this paper, we propose an approach in which the unlabelled set is divided into multiple equal-sized subsets. These subsets are processed in an incremental fashion: for each iteration a new subset is added to the data used for SSL, starting from only one subset in the first iteration. The acoustic model from the previous iteration becomes the seed model for the next one. This scheduling strategy is compared to the approach employing all unlabeled data in one-shot for training. Experiments using lattice-free maximum mutual information based acoustic model training on Fisher English gives 80% word error recovery rate. On the multi-genre evaluation sets on Lithuanian and Bulgarian relative improvements of up to 17.2% in word error rate are observed.","PeriodicalId":13127,"journal":{"name":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"332 1","pages":"7419-7423"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Incremental Semi-Supervised Learning for Multi-Genre Speech Recognition\",\"authors\":\"B. K. Khonglah, S. Madikeri, S. Dey, H. Bourlard, P. Motlícek, J. Billa\",\"doi\":\"10.1109/ICASSP40776.2020.9054309\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, we explore a data scheduling strategy for semi-supervised learning (SSL) for acoustic modeling in automatic speech recognition. The conventional approach uses a seed model trained with supervised data to automatically recognize the entire set of unlabeled (auxiliary) data to generate new labels for subsequent acoustic model training. In this paper, we propose an approach in which the unlabelled set is divided into multiple equal-sized subsets. These subsets are processed in an incremental fashion: for each iteration a new subset is added to the data used for SSL, starting from only one subset in the first iteration. The acoustic model from the previous iteration becomes the seed model for the next one. This scheduling strategy is compared to the approach employing all unlabeled data in one-shot for training. Experiments using lattice-free maximum mutual information based acoustic model training on Fisher English gives 80% word error recovery rate. On the multi-genre evaluation sets on Lithuanian and Bulgarian relative improvements of up to 17.2% in word error rate are observed.\",\"PeriodicalId\":13127,\"journal\":{\"name\":\"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"332 1\",\"pages\":\"7419-7423\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP40776.2020.9054309\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP40776.2020.9054309","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

摘要

在这项工作中，我们探索了一种用于自动语音识别声学建模的半监督学习(SSL)数据调度策略。传统方法使用经过监督数据训练的种子模型来自动识别整个未标记(辅助)数据集，为后续声学模型训练生成新的标签。在本文中，我们提出了一种将未标记集划分为多个等大小子集的方法。这些子集以增量方式处理:对于每个迭代，将一个新的子集添加到用于SSL的数据中，从第一次迭代中的一个子集开始。前一次迭代的声学模型成为下一次迭代的种子模型。将此调度策略与一次性使用所有未标记数据进行训练的方法进行了比较。使用基于无格最大互信息的声学模型训练Fisher英语的实验，单词错误恢复率达到80%。在多体裁评价集上，立陶宛语和保加利亚语的错误率相对提高了17.2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Incremental Semi-Supervised Learning for Multi-Genre Speech Recognition

In this work, we explore a data scheduling strategy for semi-supervised learning (SSL) for acoustic modeling in automatic speech recognition. The conventional approach uses a seed model trained with supervised data to automatically recognize the entire set of unlabeled (auxiliary) data to generate new labels for subsequent acoustic model training. In this paper, we propose an approach in which the unlabelled set is divided into multiple equal-sized subsets. These subsets are processed in an incremental fashion: for each iteration a new subset is added to the data used for SSL, starting from only one subset in the first iteration. The acoustic model from the previous iteration becomes the seed model for the next one. This scheduling strategy is compared to the approach employing all unlabeled data in one-shot for training. Experiments using lattice-free maximum mutual information based acoustic model training on Fisher English gives 80% word error recovery rate. On the multi-genre evaluation sets on Lithuanian and Bulgarian relative improvements of up to 17.2% in word error rate are observed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量