From Transformers to Reformers

Nauman Riaz, Seemab Latif, R. Latif
{"title":"From Transformers to Reformers","authors":"Nauman Riaz, Seemab Latif, R. Latif","doi":"10.1109/ICoDT252288.2021.9441516","DOIUrl":null,"url":null,"abstract":"This paper investigates different deep learning models for various tasks of Natural Language Processing. Recent ongoing research is about the Transformer models and their variations (like the Reformer model). The Recurrent Neural Networks models were efficient up to an only a fixed size of the window. They were unable to capture long-term dependencies for large sequences. To overcome this limitation, the attention mechanism was introduced which is incorporated in the Transformer model. The dot product attention in transformers has a complexity of O(n2) where n is the sequence length. This computation becomes infeasible for large sequences. Also, the residual layers consume a lot of memory because activations need to be stored for back-propagation. To overcome this limitation of memory efficiency and to make transformers learn over larger sequences, the Reformer models were introduced. Our research includes the evaluation of the performance of these two models on various Natural Language Processing tasks.","PeriodicalId":207832,"journal":{"name":"2021 International Conference on Digital Futures and Transformative Technologies (ICoDT2)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Digital Futures and Transformative Technologies (ICoDT2)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICoDT252288.2021.9441516","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

This paper investigates different deep learning models for various tasks of Natural Language Processing. Recent ongoing research is about the Transformer models and their variations (like the Reformer model). The Recurrent Neural Networks models were efficient up to an only a fixed size of the window. They were unable to capture long-term dependencies for large sequences. To overcome this limitation, the attention mechanism was introduced which is incorporated in the Transformer model. The dot product attention in transformers has a complexity of O(n2) where n is the sequence length. This computation becomes infeasible for large sequences. Also, the residual layers consume a lot of memory because activations need to be stored for back-propagation. To overcome this limitation of memory efficiency and to make transformers learn over larger sequences, the Reformer models were introduced. Our research includes the evaluation of the performance of these two models on various Natural Language Processing tasks.
从变形金刚到改革者
本文针对自然语言处理的不同任务,研究了不同的深度学习模型。最近正在进行的研究是关于Transformer模型及其变体(如Reformer模型)的。递归神经网络模型仅在固定大小的窗口内有效。他们无法捕获大序列的长期依赖关系。为了克服这一限制,引入了注意力机制,并将其合并到Transformer模型中。变压器中的点积注意力复杂度为O(n2),其中n为序列长度。对于大序列,这种计算变得不可行。此外,残余层消耗大量内存,因为需要存储激活以进行反向传播。为了克服这一限制,并使变压器在更大的序列上学习,引入了改革者模型。我们的研究包括评估这两种模型在各种自然语言处理任务上的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信