Gated Slot Attention for Efficient Linear-Time Sequence Modeling

Yu Zhang, Songlin Yang, Ruijie Zhu, Yue Zhang, Leyang Cui, Yiqiao Wang, Bolun Wang, Freda Shi, Bailin Wang, Wei Bi, Peng Zhou, Guohong Fu
{"title":"Gated Slot Attention for Efficient Linear-Time Sequence Modeling","authors":"Yu Zhang, Songlin Yang, Ruijie Zhu, Yue Zhang, Leyang Cui, Yiqiao Wang, Bolun Wang, Freda Shi, Bailin Wang, Wei Bi, Peng Zhou, Guohong Fu","doi":"arxiv-2409.07146","DOIUrl":null,"url":null,"abstract":"Linear attention Transformers and their gated variants, celebrated for\nenabling parallel training and efficient recurrent inference, still fall short\nin recall-intensive tasks compared to traditional Transformers and demand\nsignificant resources for training from scratch. This paper introduces Gated\nSlot Attention (GSA), which enhances Attention with Bounded-memory-Control\n(ABC) by incorporating a gating mechanism inspired by Gated Linear Attention\n(GLA). Essentially, GSA comprises a two-layer GLA linked via softmax, utilizing\ncontext-aware memory reading and adaptive forgetting to improve memory capacity\nwhile maintaining compact recurrent state size. This design greatly enhances\nboth training and inference efficiency through GLA's hardware-efficient\ntraining algorithm and reduced state size. Additionally, retaining the softmax\noperation is particularly beneficial in \"finetuning pretrained Transformers to\nRNNs\" (T2R) settings, reducing the need for extensive training from scratch.\nExtensive experiments confirm GSA's superior performance in scenarios requiring\nin-context recall and in T2R settings.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"34 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07146","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Linear attention Transformers and their gated variants, celebrated for enabling parallel training and efficient recurrent inference, still fall short in recall-intensive tasks compared to traditional Transformers and demand significant resources for training from scratch. This paper introduces Gated Slot Attention (GSA), which enhances Attention with Bounded-memory-Control (ABC) by incorporating a gating mechanism inspired by Gated Linear Attention (GLA). Essentially, GSA comprises a two-layer GLA linked via softmax, utilizing context-aware memory reading and adaptive forgetting to improve memory capacity while maintaining compact recurrent state size. This design greatly enhances both training and inference efficiency through GLA's hardware-efficient training algorithm and reduced state size. Additionally, retaining the softmax operation is particularly beneficial in "finetuning pretrained Transformers to RNNs" (T2R) settings, reducing the need for extensive training from scratch. Extensive experiments confirm GSA's superior performance in scenarios requiring in-context recall and in T2R settings.
用于高效线性时序建模的门控插槽注意力
线性注意变换器及其门控变体虽然被认为可以实现并行训练和高效的循环推理,但与传统变换器相比,它们在回忆密集型任务中仍有不足,而且需要大量资源从头开始训练。本文介绍了门控插槽注意力(GatedSlot Attention,GSA),它通过结合受门控线性注意力(Gated Linear Attention,GLA)启发的门控机制,增强了有界内存控制注意力(Attention with Bounded-memory-Control,ABC)。从本质上讲,GSA 包括一个通过软最大值(softmax)连接的双层 GLA,利用上下文感知记忆读取和自适应遗忘来提高记忆容量,同时保持紧凑的递归状态大小。这种设计通过 GLA 的硬件系数训练算法和更小的状态大小,大大提高了训练和推理效率。此外,保留软最大操作在 "微调预训练变换器到 RNN"(T2R)设置中尤为有利,减少了从头开始进行大量训练的需要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信