{"title":"神经机器翻译的自注意与动态卷积混合模型","authors":"Zhebin Zhang, Sai Wu, Gang Chen, Dawei Jiang","doi":"10.1109/ICBK50248.2020.00057","DOIUrl":null,"url":null,"abstract":"In sequence-to-sequence learning, models based on the self-attention mechanism dominate the network structures used for neural machine translation. Recently, convolutional networks have been demonstrated to perform excellently on various translation tasks. Despite the fact that self-attention and convolution have different strengths in modeling sequences, few efforts have been devoted to combining them. In this work, we propose a hybrid model that benefits from both mechanisms. We combine a self-attention module and a dynamic convolution module by taking a weighted sum of their outputs where the weights can be dynamically learned by the model during training. Experimental results show that our hybrid model outperforms baseline models built solely on either of these two mechanisms. And we produce new state-of-the-art results on IWSLT’15 English-German dataset.","PeriodicalId":432857,"journal":{"name":"2020 IEEE International Conference on Knowledge Graph (ICKG)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Self-Attention and Dynamic Convolution Hybrid Model for Neural Machine Translation\",\"authors\":\"Zhebin Zhang, Sai Wu, Gang Chen, Dawei Jiang\",\"doi\":\"10.1109/ICBK50248.2020.00057\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In sequence-to-sequence learning, models based on the self-attention mechanism dominate the network structures used for neural machine translation. Recently, convolutional networks have been demonstrated to perform excellently on various translation tasks. Despite the fact that self-attention and convolution have different strengths in modeling sequences, few efforts have been devoted to combining them. In this work, we propose a hybrid model that benefits from both mechanisms. We combine a self-attention module and a dynamic convolution module by taking a weighted sum of their outputs where the weights can be dynamically learned by the model during training. Experimental results show that our hybrid model outperforms baseline models built solely on either of these two mechanisms. And we produce new state-of-the-art results on IWSLT’15 English-German dataset.\",\"PeriodicalId\":432857,\"journal\":{\"name\":\"2020 IEEE International Conference on Knowledge Graph (ICKG)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Knowledge Graph (ICKG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICBK50248.2020.00057\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Knowledge Graph (ICKG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBK50248.2020.00057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Self-Attention and Dynamic Convolution Hybrid Model for Neural Machine Translation
In sequence-to-sequence learning, models based on the self-attention mechanism dominate the network structures used for neural machine translation. Recently, convolutional networks have been demonstrated to perform excellently on various translation tasks. Despite the fact that self-attention and convolution have different strengths in modeling sequences, few efforts have been devoted to combining them. In this work, we propose a hybrid model that benefits from both mechanisms. We combine a self-attention module and a dynamic convolution module by taking a weighted sum of their outputs where the weights can be dynamically learned by the model during training. Experimental results show that our hybrid model outperforms baseline models built solely on either of these two mechanisms. And we produce new state-of-the-art results on IWSLT’15 English-German dataset.