具有卷积注意的轻量级变压器

2020 11th International Conference on Awareness Science and Technology (iCAST) Pub Date : 2020-12-07 DOI:10.1109/iCAST51195.2020.9319489

Kungan Zeng, Incheon Paik

{"title":"具有卷积注意的轻量级变压器","authors":"Kungan Zeng, Incheon Paik","doi":"10.1109/iCAST51195.2020.9319489","DOIUrl":null,"url":null,"abstract":"Neural machine translation (NMT) goes through rapid development because of the application of various deep learning techs. Especially, how to construct a more effective structure of NMT attracts more and more attention. Transformer is a state-of-the-art architecture in NMT. It replies on the self-attention mechanism exactly instead of recurrent neural networks (RNN). The Multi-head attention is a crucial part that implements the self-attention mechanism, and it also dramatically affects the scale of the model. In this paper, we present a new Multi-head attention by combining convolution operation. In comparison with the base Transformer, our approach can reduce the number of parameters effectively. And we perform a reasoned experiment. The result shows that the performance of the new model is similar to the base model.","PeriodicalId":212570,"journal":{"name":"2020 11th International Conference on Awareness Science and Technology (iCAST)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Lightweight Transformer with Convolutional Attention\",\"authors\":\"Kungan Zeng, Incheon Paik\",\"doi\":\"10.1109/iCAST51195.2020.9319489\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Neural machine translation (NMT) goes through rapid development because of the application of various deep learning techs. Especially, how to construct a more effective structure of NMT attracts more and more attention. Transformer is a state-of-the-art architecture in NMT. It replies on the self-attention mechanism exactly instead of recurrent neural networks (RNN). The Multi-head attention is a crucial part that implements the self-attention mechanism, and it also dramatically affects the scale of the model. In this paper, we present a new Multi-head attention by combining convolution operation. In comparison with the base Transformer, our approach can reduce the number of parameters effectively. And we perform a reasoned experiment. The result shows that the performance of the new model is similar to the base model.\",\"PeriodicalId\":212570,\"journal\":{\"name\":\"2020 11th International Conference on Awareness Science and Technology (iCAST)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 11th International Conference on Awareness Science and Technology (iCAST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iCAST51195.2020.9319489\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 11th International Conference on Awareness Science and Technology (iCAST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iCAST51195.2020.9319489","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

由于各种深度学习技术的应用，神经机器翻译(NMT)得到了快速发展。特别是如何构建一个更有效的网络翻译结构越来越受到人们的关注。Transformer是NMT中最先进的架构。它完全依赖于自注意机制，而不是循环神经网络(RNN)。多头注意是实现自注意机制的关键部分，它对模型的尺度影响很大。本文结合卷积运算，提出了一种新的多头注意算法。与基本变压器相比，我们的方法可以有效地减少参数的数量。我们做了一个合理的实验。结果表明，新模型的性能与基本模型相近。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Lightweight Transformer with Convolutional Attention

Neural machine translation (NMT) goes through rapid development because of the application of various deep learning techs. Especially, how to construct a more effective structure of NMT attracts more and more attention. Transformer is a state-of-the-art architecture in NMT. It replies on the self-attention mechanism exactly instead of recurrent neural networks (RNN). The Multi-head attention is a crucial part that implements the self-attention mechanism, and it also dramatically affects the scale of the model. In this paper, we present a new Multi-head attention by combining convolution operation. In comparison with the base Transformer, our approach can reduce the number of parameters effectively. And we perform a reasoned experiment. The result shows that the performance of the new model is similar to the base model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 11th International Conference on Awareness Science and Technology (iCAST)

自引率

0.00%

发文量