Improved End-to-End Dysarthric Speech Recognition via Meta-learning Based Model Re-initialization

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP) Pub Date : 2020-11-03 DOI:10.1109/ISCSLP49672.2021.9362068

Disong Wang, Jianwei Yu, Xixin Wu, Lifa Sun, Xunying Liu, H. Meng

{"title":"Improved End-to-End Dysarthric Speech Recognition via Meta-learning Based Model Re-initialization","authors":"Disong Wang, Jianwei Yu, Xixin Wu, Lifa Sun, Xunying Liu, H. Meng","doi":"10.1109/ISCSLP49672.2021.9362068","DOIUrl":null,"url":null,"abstract":"Dysarthric speech recognition is a challenging task as dysarthric data is limited and its acoustics deviate significantly from normal speech. Model-based speaker adaptation is a promising method by using the limited dysarthric speech to fine-tune a base model that has been pre-trained from large amounts of normal speech to obtain speaker-dependent models. However, statistic distribution mismatches between the normal and dysarthric speech data limit the adaptation performance of the base model. To address this problem, we propose to re-initialize the base model via meta-learning to obtain a better model initialization. Specifically, we focus on end-to-end models and extend the model-agnostic meta learning (MAML) and Reptile algorithms to meta update the base model by repeatedly simulating adaptation to different dysarthric speakers. As a result, the re-initialized model acquires dysarthric speech knowledge and learns how to perform fast adaptation to unseen dysarthric speakers with improved performance. Experimental results on UASpeech dataset show that the best model with proposed methods achieves 54.2% and 7.6% relative word error rate reduction compared with the base model without finetuning and the model directly fine-tuned from the base model, respectively, and it is comparable with the state-of-the-art hybrid DNN-HMM model.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP49672.2021.9362068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 26

Abstract

Dysarthric speech recognition is a challenging task as dysarthric data is limited and its acoustics deviate significantly from normal speech. Model-based speaker adaptation is a promising method by using the limited dysarthric speech to fine-tune a base model that has been pre-trained from large amounts of normal speech to obtain speaker-dependent models. However, statistic distribution mismatches between the normal and dysarthric speech data limit the adaptation performance of the base model. To address this problem, we propose to re-initialize the base model via meta-learning to obtain a better model initialization. Specifically, we focus on end-to-end models and extend the model-agnostic meta learning (MAML) and Reptile algorithms to meta update the base model by repeatedly simulating adaptation to different dysarthric speakers. As a result, the re-initialized model acquires dysarthric speech knowledge and learns how to perform fast adaptation to unseen dysarthric speakers with improved performance. Experimental results on UASpeech dataset show that the best model with proposed methods achieves 54.2% and 7.6% relative word error rate reduction compared with the base model without finetuning and the model directly fine-tuned from the base model, respectively, and it is comparable with the state-of-the-art hybrid DNN-HMM model.

查看原文本刊更多论文

基于元学习模型重新初始化的端到端困难语音识别改进

困难语音识别是一项具有挑战性的任务，因为困难语音数据是有限的，其声学偏离正常语音显著。基于模型的说话人自适应是一种很有前途的方法，它利用有限的困难语音对从大量正常语音中预训练出来的基本模型进行微调，从而获得说话人依赖模型。然而，正常语音数据与困难语音数据的统计分布不匹配限制了基础模型的自适应性能。为了解决这个问题，我们建议通过元学习重新初始化基础模型，以获得更好的模型初始化。具体来说，我们专注于端到端模型，并扩展了与模型无关的元学习(MAML)和Reptile算法，通过反复模拟对不同语言障碍说话者的适应来元更新基本模型。因此，重新初始化的模型获得了困难言语知识，并学习如何快速适应未见过的困难言语者，从而提高了性能。在uasspeech数据集上的实验结果表明，与未进行微调的基础模型和直接从基础模型进行微调的模型相比，采用该方法的最佳模型的相对词错误率分别降低了54.2%和7.6%，与目前最先进的DNN-HMM混合模型相当。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)

自引率

0.00%

发文量