Semi-Supervised Learning with Limited Data for Automatic Speech Recognition

2022 IEEE 7th Forum on Research and Technologies for Society and Industry Innovation (RTSI) Pub Date : 2022-08-24 DOI:10.1109/RTSI55261.2022.9905112

Mikolaj Pudo, Natalia Szczepanek, Bozena Lukasiak, A. Janicki

引用次数: 0

Abstract

In this paper, we analyze the performance of semi-supervised learning (SSL) methods for the automatic speech recognition (ASR) task. We focus on the case of model adaptation using small unlabeled datasets. The basic SSL method that we apply uses pseudo-labels generated by the adapted model itself, however, we also propose and analyze a number of improvements to SSL. Furthermore, we investigate the possibility of using these methods on the datasets with the token distributions significantly different from the one represented by the training data. We show that in certain conditions, even very small amounts of data can improve the ASR model performance. Using the proposed SSL variant, we were able to reduce WER by 12-22%, depending on the dataset.

查看原文本刊更多论文

基于有限数据的半监督学习自动语音识别

在本文中，我们分析了半监督学习(SSL)方法在自动语音识别(ASR)任务中的性能。我们专注于使用小型未标记数据集的模型自适应情况。我们应用的基本SSL方法使用了由适应模型本身生成的伪标签，但是，我们还提出并分析了对SSL的许多改进。此外，我们还研究了在令牌分布与训练数据所表示的令牌分布显著不同的数据集上使用这些方法的可能性。我们表明，在某些条件下，即使是非常少量的数据也可以提高ASR模型的性能。使用建议的SSL变体，我们能够将WER降低12-22%，具体取决于数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 7th Forum on Research and Technologies for Society and Industry Innovation (RTSI)

自引率

0.00%

发文量