端到端语音翻译与变压器

IberSPEECH Conference Pub Date : 2018-11-21 DOI:10.21437/IBERSPEECH.2018-13

Laura Cross Vila, Carlos Escolano, José A. R. Fonollosa, M. Costa-jussà

{"title":"端到端语音翻译与变压器","authors":"Laura Cross Vila, Carlos Escolano, José A. R. Fonollosa, M. Costa-jussà","doi":"10.21437/IBERSPEECH.2018-13","DOIUrl":null,"url":null,"abstract":"Speech Translation has been traditionally addressed with the concatenation of two tasks: Speech Recognition and Machine Translation. This approach has the main drawback that errors are concatenated. Recently, neural approaches to Speech Recognition and Machine Translation have made possible facing the task by means of an End-to-End Speech Translation architecture. In this paper, we propose to use the architecture of the Transformer which is based solely on attention-based mechanisms to address the End-to-End Speech Translation system. As a contrastive architecture, we use the same Transformer to built the Speech Recognition and Machine Translation systems to perform Speech Translation through concatenation of systems. Results on a Spanish-to-English standard task show that the end-to-end architecture is able to outperform the concatenated systems by half point BLEU.","PeriodicalId":115963,"journal":{"name":"IberSPEECH Conference","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"57","resultStr":"{\"title\":\"End-to-End Speech Translation with the Transformer\",\"authors\":\"Laura Cross Vila, Carlos Escolano, José A. R. Fonollosa, M. Costa-jussà\",\"doi\":\"10.21437/IBERSPEECH.2018-13\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech Translation has been traditionally addressed with the concatenation of two tasks: Speech Recognition and Machine Translation. This approach has the main drawback that errors are concatenated. Recently, neural approaches to Speech Recognition and Machine Translation have made possible facing the task by means of an End-to-End Speech Translation architecture. In this paper, we propose to use the architecture of the Transformer which is based solely on attention-based mechanisms to address the End-to-End Speech Translation system. As a contrastive architecture, we use the same Transformer to built the Speech Recognition and Machine Translation systems to perform Speech Translation through concatenation of systems. Results on a Spanish-to-English standard task show that the end-to-end architecture is able to outperform the concatenated systems by half point BLEU.\",\"PeriodicalId\":115963,\"journal\":{\"name\":\"IberSPEECH Conference\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"57\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IberSPEECH Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/IBERSPEECH.2018-13\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IberSPEECH Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/IBERSPEECH.2018-13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 57

摘要

传统上，语音翻译由两个任务组成:语音识别和机器翻译。这种方法的主要缺点是错误是串联起来的。最近，语音识别和机器翻译的神经方法通过端到端语音翻译架构使得面对这一任务成为可能。在本文中，我们建议使用Transformer的架构，该架构完全基于基于注意力的机制来解决端到端语音翻译系统。作为一种对比体系结构，我们使用相同的Transformer来构建语音识别和机器翻译系统，通过系统的连接来执行语音翻译。在西班牙语到英语标准任务上的结果表明，端到端架构能够比连接系统的性能高出0.5个BLEU。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

End-to-End Speech Translation with the Transformer

Speech Translation has been traditionally addressed with the concatenation of two tasks: Speech Recognition and Machine Translation. This approach has the main drawback that errors are concatenated. Recently, neural approaches to Speech Recognition and Machine Translation have made possible facing the task by means of an End-to-End Speech Translation architecture. In this paper, we propose to use the architecture of the Transformer which is based solely on attention-based mechanisms to address the End-to-End Speech Translation system. As a contrastive architecture, we use the same Transformer to built the Speech Recognition and Machine Translation systems to perform Speech Translation through concatenation of systems. Results on a Spanish-to-English standard task show that the end-to-end architecture is able to outperform the concatenated systems by half point BLEU.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IberSPEECH Conference

自引率

0.00%

发文量