M-CTRL:一个具有缓慢改进过去预训练模型的持续表示学习框架

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2023-06-04 DOI:10.1109/ICASSP49357.2023.10096793

Jin-Seong Choi, Jaehwan Lee, Chae-Won Lee, Joon‐Hyuk Chang

{"title":"M-CTRL:一个具有缓慢改进过去预训练模型的持续表示学习框架","authors":"Jin-Seong Choi, Jaehwan Lee, Chae-Won Lee, Joon‐Hyuk Chang","doi":"10.1109/ICASSP49357.2023.10096793","DOIUrl":null,"url":null,"abstract":"Representation models pre-trained on unlabeled data show competitive performance in speech recognition, even when fine-tuned on small amounts of labeled data. The continual representation learning (CTRL) framework combines pre-training and continual learning methods to obtain powerful representation. CTRL relies on two neural networks, online and offline models, where the fixed latter model transfers information to the former model with continual learning loss. In this paper, we present momentum continual representation learning (M-CTRL), a framework that slowly updates the offline model with an exponential moving average of the online model. Our framework aims to capture information from the offline model improved on past and new domains. To evaluate our framework, we continually pre-train wav2vec 2.0 with M-CTRL in the following order: Librispeech, Wall Street Journal, and TED-LIUM V3. Our experiments demonstrate that M-CTRL improves the performance in the new domain and reduces information loss in the past domain compared to CTRL.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"M-CTRL: A Continual Representation Learning Framework with Slowly Improving Past Pre-Trained Model\",\"authors\":\"Jin-Seong Choi, Jaehwan Lee, Chae-Won Lee, Joon‐Hyuk Chang\",\"doi\":\"10.1109/ICASSP49357.2023.10096793\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Representation models pre-trained on unlabeled data show competitive performance in speech recognition, even when fine-tuned on small amounts of labeled data. The continual representation learning (CTRL) framework combines pre-training and continual learning methods to obtain powerful representation. CTRL relies on two neural networks, online and offline models, where the fixed latter model transfers information to the former model with continual learning loss. In this paper, we present momentum continual representation learning (M-CTRL), a framework that slowly updates the offline model with an exponential moving average of the online model. Our framework aims to capture information from the offline model improved on past and new domains. To evaluate our framework, we continually pre-train wav2vec 2.0 with M-CTRL in the following order: Librispeech, Wall Street Journal, and TED-LIUM V3. Our experiments demonstrate that M-CTRL improves the performance in the new domain and reduces information loss in the past domain compared to CTRL.\",\"PeriodicalId\":113072,\"journal\":{\"name\":\"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP49357.2023.10096793\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP49357.2023.10096793","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在未标记数据上预先训练的表示模型在语音识别中表现出竞争力，即使在少量标记数据上进行微调。持续表征学习(CTRL)框架结合了预训练和持续学习的方法来获得强大的表征。CTRL依赖于两种神经网络，在线和离线模型，固定的后一种模型将信息传递给前一种模型，学习损失不断。在本文中，我们提出了动量连续表示学习(M-CTRL)，这是一个使用在线模型的指数移动平均缓慢更新离线模型的框架。我们的框架旨在从过去和新领域改进的离线模型中捕获信息。为了评估我们的框架，我们持续使用M-CTRL按以下顺序预训练wav2vec 2.0: librisspeech、Wall Street Journal和TED-LIUM V3。我们的实验表明，与CTRL相比，M-CTRL提高了新域的性能，减少了过去域的信息丢失。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

M-CTRL: A Continual Representation Learning Framework with Slowly Improving Past Pre-Trained Model

Representation models pre-trained on unlabeled data show competitive performance in speech recognition, even when fine-tuned on small amounts of labeled data. The continual representation learning (CTRL) framework combines pre-training and continual learning methods to obtain powerful representation. CTRL relies on two neural networks, online and offline models, where the fixed latter model transfers information to the former model with continual learning loss. In this paper, we present momentum continual representation learning (M-CTRL), a framework that slowly updates the offline model with an exponential moving average of the online model. Our framework aims to capture information from the offline model improved on past and new domains. To evaluate our framework, we continually pre-train wav2vec 2.0 with M-CTRL in the following order: Librispeech, Wall Street Journal, and TED-LIUM V3. Our experiments demonstrate that M-CTRL improves the performance in the new domain and reduces information loss in the past domain compared to CTRL.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量