基于注意力的在线编码器-解码器模型的改进多阶段训练

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI:10.1109/ASRU46091.2019.9003936

Abhinav Garg, Dhananjaya N. Gowda, Ankur Kumar, Kwangyoun Kim, Mehul Kumar, Chanwoo Kim

{"title":"基于注意力的在线编码器-解码器模型的改进多阶段训练","authors":"Abhinav Garg, Dhananjaya N. Gowda, Ankur Kumar, Kwangyoun Kim, Mehul Kumar, Chanwoo Kim","doi":"10.1109/ASRU46091.2019.9003936","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a refined multi-stage multi-task training strategy to improve the performance of online attention-based encoder-decoder (AED) models. A three-stage training based on three levels of architectural granularity namely, character encoder, byte pair encoding (BPE) based encoder, and attention decoder, is proposed. Also, multi-task learning based on two-levels of linguistic granularity namely, character and BPE, is used. We explore different pre-training strategies for the encoders including transfer learning from a bidirectional encoder. Our encoder-decoder models with online attention show ~35% and ~10% relative improvement over their baselines for smaller and bigger models, respectively. Our models achieve a word error rate (WER) of 5.04% and 4.48% on the Librispeech test-clean data for the smaller and bigger models respectively after fusion with long short-term memory (LSTM) based external language model (LM).","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Improved Multi-Stage Training of Online Attention-Based Encoder-Decoder Models\",\"authors\":\"Abhinav Garg, Dhananjaya N. Gowda, Ankur Kumar, Kwangyoun Kim, Mehul Kumar, Chanwoo Kim\",\"doi\":\"10.1109/ASRU46091.2019.9003936\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a refined multi-stage multi-task training strategy to improve the performance of online attention-based encoder-decoder (AED) models. A three-stage training based on three levels of architectural granularity namely, character encoder, byte pair encoding (BPE) based encoder, and attention decoder, is proposed. Also, multi-task learning based on two-levels of linguistic granularity namely, character and BPE, is used. We explore different pre-training strategies for the encoders including transfer learning from a bidirectional encoder. Our encoder-decoder models with online attention show ~35% and ~10% relative improvement over their baselines for smaller and bigger models, respectively. Our models achieve a word error rate (WER) of 5.04% and 4.48% on the Librispeech test-clean data for the smaller and bigger models respectively after fusion with long short-term memory (LSTM) based external language model (LM).\",\"PeriodicalId\":150913,\"journal\":{\"name\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU46091.2019.9003936\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9003936","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

在本文中，我们提出了一种改进的多阶段多任务训练策略，以提高在线基于注意力的编码器-解码器(AED)模型的性能。提出了基于字符编码器、基于字节对编码(byte pair encoding, BPE)的编码器和注意解码器三个层次结构粒度的三阶段训练方法。此外，还采用了基于两级语言粒度即字符和BPE的多任务学习。我们探讨了编码器的不同预训练策略，包括双向编码器的迁移学习。我们的具有在线注意力的编码器-解码器模型分别在较小和较大模型的基线上显示出~35%和~10%的相对改进。我们的模型与基于长短期记忆(LSTM)的外部语言模型(LM)融合后，在libisspeech测试清洁数据上分别实现了5.04%和4.48%的单词错误率(WER)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improved Multi-Stage Training of Online Attention-Based Encoder-Decoder Models

In this paper, we propose a refined multi-stage multi-task training strategy to improve the performance of online attention-based encoder-decoder (AED) models. A three-stage training based on three levels of architectural granularity namely, character encoder, byte pair encoding (BPE) based encoder, and attention decoder, is proposed. Also, multi-task learning based on two-levels of linguistic granularity namely, character and BPE, is used. We explore different pre-training strategies for the encoders including transfer learning from a bidirectional encoder. Our encoder-decoder models with online attention show ~35% and ~10% relative improvement over their baselines for smaller and bigger models, respectively. Our models achieve a word error rate (WER) of 5.04% and 4.48% on the Librispeech test-clean data for the smaller and bigger models respectively after fusion with long short-term memory (LSTM) based external language model (LM).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

自引率

0.00%

发文量