From Transformers to Reformers

2021 International Conference on Digital Futures and Transformative Technologies (ICoDT2) Pub Date : 2021-05-20 DOI:10.1109/ICoDT252288.2021.9441516

Nauman Riaz, Seemab Latif, R. Latif

引用次数: 2

Abstract

This paper investigates different deep learning models for various tasks of Natural Language Processing. Recent ongoing research is about the Transformer models and their variations (like the Reformer model). The Recurrent Neural Networks models were efficient up to an only a fixed size of the window. They were unable to capture long-term dependencies for large sequences. To overcome this limitation, the attention mechanism was introduced which is incorporated in the Transformer model. The dot product attention in transformers has a complexity of O(n2) where n is the sequence length. This computation becomes infeasible for large sequences. Also, the residual layers consume a lot of memory because activations need to be stored for back-propagation. To overcome this limitation of memory efficiency and to make transformers learn over larger sequences, the Reformer models were introduced. Our research includes the evaluation of the performance of these two models on various Natural Language Processing tasks.

查看原文本刊更多论文

从变形金刚到改革者

本文针对自然语言处理的不同任务，研究了不同的深度学习模型。最近正在进行的研究是关于Transformer模型及其变体(如Reformer模型)的。递归神经网络模型仅在固定大小的窗口内有效。他们无法捕获大序列的长期依赖关系。为了克服这一限制，引入了注意力机制，并将其合并到Transformer模型中。变压器中的点积注意力复杂度为O(n2)，其中n为序列长度。对于大序列，这种计算变得不可行。此外，残余层消耗大量内存，因为需要存储激活以进行反向传播。为了克服这一限制，并使变压器在更大的序列上学习，引入了改革者模型。我们的研究包括评估这两种模型在各种自然语言处理任务上的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 International Conference on Digital Futures and Transformative Technologies (ICoDT2)

自引率

0.00%

发文量