Improved Multi-Stage Training of Online Attention-Based Encoder-Decoder Models

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI:10.1109/ASRU46091.2019.9003936

Abhinav Garg, Dhananjaya N. Gowda, Ankur Kumar, Kwangyoun Kim, Mehul Kumar, Chanwoo Kim

引用次数: 13

Abstract

In this paper, we propose a refined multi-stage multi-task training strategy to improve the performance of online attention-based encoder-decoder (AED) models. A three-stage training based on three levels of architectural granularity namely, character encoder, byte pair encoding (BPE) based encoder, and attention decoder, is proposed. Also, multi-task learning based on two-levels of linguistic granularity namely, character and BPE, is used. We explore different pre-training strategies for the encoders including transfer learning from a bidirectional encoder. Our encoder-decoder models with online attention show ~35% and ~10% relative improvement over their baselines for smaller and bigger models, respectively. Our models achieve a word error rate (WER) of 5.04% and 4.48% on the Librispeech test-clean data for the smaller and bigger models respectively after fusion with long short-term memory (LSTM) based external language model (LM).

查看原文本刊更多论文

基于注意力的在线编码器-解码器模型的改进多阶段训练

在本文中，我们提出了一种改进的多阶段多任务训练策略，以提高在线基于注意力的编码器-解码器(AED)模型的性能。提出了基于字符编码器、基于字节对编码(byte pair encoding, BPE)的编码器和注意解码器三个层次结构粒度的三阶段训练方法。此外，还采用了基于两级语言粒度即字符和BPE的多任务学习。我们探讨了编码器的不同预训练策略，包括双向编码器的迁移学习。我们的具有在线注意力的编码器-解码器模型分别在较小和较大模型的基线上显示出~35%和~10%的相对改进。我们的模型与基于长短期记忆(LSTM)的外部语言模型(LM)融合后，在libisspeech测试清洁数据上分别实现了5.04%和4.48%的单词错误率(WER)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

自引率

0.00%

发文量