Task Aware Multi-Task Learning for Speech to Text Tasks

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2021-06-06 DOI:10.1109/ICASSP39728.2021.9414703

S. Indurthi, Mohd Abbas Zaidi, Nikhil Kumar Lakumarapu, Beomseok Lee, HyoJung Han, Seokchan Ahn, Sangha Kim, Chanwoo Kim, Inchul Hwang

引用次数: 18

Abstract

In general, the direct Speech-to-text translation (ST) is jointly trained with Automatic Speech Recognition (ASR), and Machine Translation (MT) tasks. However, the issues with the current joint learning strategies inhibit the knowledge transfer across these tasks. We propose a task modulation network which allows the model to learn task specific features, while learning the shared features simultaneously. This proposed approach removes the need for separate finetuning step resulting in a single model which performs all these tasks. This single model achieves a performance of 28.64 BLEU score on ST MuST-C English-German, WER of 11.61% on ASR TEDLium v3, 23.35 BLEU score on MT WMT’15 English-German task. This sets a new state-of-the-art performance (SOTA) on the ST task while outperforming the existing end-to-end ASR systems.

查看原文本刊更多论文

语音到文本任务的任务感知多任务学习

一般来说，直接语音到文本翻译(ST)是与自动语音识别(ASR)和机器翻译(MT)任务联合训练的。然而，当前联合学习策略的问题阻碍了这些任务之间的知识转移。我们提出了一种任务调制网络，它允许模型在学习任务特定特征的同时学习共享特征。这种建议的方法不需要单独的调优步骤，从而产生执行所有这些任务的单个模型。该单一模型在ST MuST-C English-German上的性能为28.64 BLEU分，在ASR TEDLium v3上的性能为11.61%，在MT WMT’15 English-German任务上的性能为23.35 BLEU分。这在ST任务上设置了新的最先进的性能(SOTA)，同时优于现有的端到端ASR系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量