Cross-Modal Decision Regularization for Simultaneous Speech Translation

Interspeech Pub Date : 2022-09-18 DOI:10.21437/interspeech.2022-10617

Mohd Abbas Zaidi, Beomseok Lee, Sangha Kim, Chanwoo Kim

引用次数: 3

Abstract

Simultaneous translation systems start producing the output while processing the partial source sentence in the incoming input stream. These systems need to decide when to read more input and when to write the output. The decisions taken by the model depend on the structure of source/target language and the information contained in the partial input sequence. Hence, read/write decision policy remains the same across different input modalities, i.e., speech and text. This motivates us to leverage the text transcripts corresponding to the speech input for improving simultaneous speech-to-text translation (SimulST). We propose Cross-Modal Decision Regularization (CMDR) to improve the decision policy of SimulST systems by using the simultaneous text-to-text translation (SimulMT) task. We also extend several techniques from the offline speech translation domain to explore the role of SimulMT task in improving SimulST performance. Overall, we achieve 34.66% / 4.5 BLEU improvement over the baseline model across different latency regimes for the MuST-C English-German (EnDe) SimulST task.

查看原文本刊更多论文

语音同声翻译的跨模态决策正则化

同声传译系统在处理输入流中的部分源语句的同时开始产生输出。这些系统需要决定何时读取更多的输入以及何时写入输出。模型所做的决定取决于源/目标语言的结构和部分输入序列中包含的信息。因此，读/写决策策略在不同的输入模式（即语音和文本）之间保持不变。这促使我们利用与语音输入相对应的文本转录本来改进语音到文本的同时翻译（SimulST）。我们提出了跨模态决策正则化（CMDR），通过使用同时文本到文本翻译（SimulMT）任务来改进SimulST系统的决策策略。我们还扩展了离线语音翻译领域的几种技术，以探索SimulMT任务在提高SimulST性能方面的作用。总体而言，在MuST-C英-德（EnDe）SimulST任务的不同延迟机制下，我们比基线模型实现了34.66%/4.5 BLEU的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Interspeech

自引率

0.00%

发文量