Convolutional Dropout and Wordpiece Augmentation for End-to-End Speech Recognition

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI:10.1109/ICASSP39728.2021.9415004

Hainan Xu, Yinghui Huang, Yun Zhu, Kartik Audhkhasi, B. Ramabhadran

{"title":"Convolutional Dropout and Wordpiece Augmentation for End-to-End Speech Recognition","authors":"Hainan Xu, Yinghui Huang, Yun Zhu, Kartik Audhkhasi, B. Ramabhadran","doi":"10.1109/ICASSP39728.2021.9415004","DOIUrl":null,"url":null,"abstract":"Regularization and data augmentation are crucial to training end-to-end automatic speech recognition systems. Dropout is a popular regularization technique, which operates on each neuron independently by multiplying it with a Bernoulli random variable. We propose a generalization of dropout, called \"convolutional dropout\", where each neuron’s activation is replaced with a randomly-weighted linear combination of neuron values in its neighborhood. We believe that this formulation combines the regularizing effect of dropout with the smoothing effects of the convolution operation. In addition to convolutional dropout, this paper also proposes using random word-piece segmentations as a data augmentation scheme during training, inspired by results in neural machine translation. We adopt both these methods during the training of transformer-transducer speech recognition models, and show consistent WER improvements on Librispeech as well as across different languages.","PeriodicalId":347060,"journal":{"name":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP39728.2021.9415004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Regularization and data augmentation are crucial to training end-to-end automatic speech recognition systems. Dropout is a popular regularization technique, which operates on each neuron independently by multiplying it with a Bernoulli random variable. We propose a generalization of dropout, called "convolutional dropout", where each neuron’s activation is replaced with a randomly-weighted linear combination of neuron values in its neighborhood. We believe that this formulation combines the regularizing effect of dropout with the smoothing effects of the convolution operation. In addition to convolutional dropout, this paper also proposes using random word-piece segmentations as a data augmentation scheme during training, inspired by results in neural machine translation. We adopt both these methods during the training of transformer-transducer speech recognition models, and show consistent WER improvements on Librispeech as well as across different languages.

查看原文本刊更多论文

端到端语音识别的卷积Dropout和词块增强

正则化和数据增强是训练端到端自动语音识别系统的关键。Dropout是一种流行的正则化技术，它通过将每个神经元与伯努利随机变量相乘来独立操作每个神经元。我们提出了一种dropout的泛化方法，称为“卷积dropout”，其中每个神经元的激活被替换为其邻近神经元值的随机加权线性组合。我们认为这个公式结合了dropout的正则化效果和卷积运算的平滑效果。除了卷积dropout之外，受神经机器翻译结果的启发，本文还提出了在训练过程中使用随机分词作为数据增强方案。我们在变压器-换能器语音识别模型的训练中采用了这两种方法，并在librisspeech和不同语言之间显示出一致的WER改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量