AttentionRC:在双寻址内存上改进位置敏感散列注意力的新方法

IF 2.7 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Chun-Lin Chu;Yun-Chih Chen;Wei Cheng;Ing-Chao Lin;Yuan-Hao Chang
{"title":"AttentionRC:在双寻址内存上改进位置敏感散列注意力的新方法","authors":"Chun-Lin Chu;Yun-Chih Chen;Wei Cheng;Ing-Chao Lin;Yuan-Hao Chang","doi":"10.1109/TCAD.2024.3447217","DOIUrl":null,"url":null,"abstract":"Attention is a crucial component of the Transformer architecture and a key factor in its success. However, it suffers from quadratic growth in time and space complexity as input sequence length increases. One popular approach to address this issue is the Reformer model, which uses locality-sensitive hashing (LSH) attention to reduce computational complexity. LSH attention hashes similar tokens in the input sequence to the same bucket and attends tokens only within the same bucket. Meanwhile, a new emerging nonvolatile memory (NVM) architecture, row column NVM (RC-NVM), has been proposed to support row- and column-oriented addressing (i.e., dual addressing). In this work, we present AttentionRC, which takes advantage of RC-NVM to further improve the efficiency of LSH attention. We first propose an LSH-friendly data mapping strategy that improves memory write and read cycles by 60.9% and 4.9%, respectively. Then, we propose a sort-free RC-aware bucket access and a swap strategy that utilizes dual-addressing to reduce 38% of the data access cycles in attention. Finally, by taking advantage of dual-addressing, we propose transpose-free attention to eliminate the transpose operations that were previously required by the attention, resulting in a 51% reduction in the matrix multiplication time.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3925-3936"},"PeriodicalIF":2.7000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AttentionRC: A Novel Approach to Improve Locality Sensitive Hashing Attention on Dual-Addressing Memory\",\"authors\":\"Chun-Lin Chu;Yun-Chih Chen;Wei Cheng;Ing-Chao Lin;Yuan-Hao Chang\",\"doi\":\"10.1109/TCAD.2024.3447217\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Attention is a crucial component of the Transformer architecture and a key factor in its success. However, it suffers from quadratic growth in time and space complexity as input sequence length increases. One popular approach to address this issue is the Reformer model, which uses locality-sensitive hashing (LSH) attention to reduce computational complexity. LSH attention hashes similar tokens in the input sequence to the same bucket and attends tokens only within the same bucket. Meanwhile, a new emerging nonvolatile memory (NVM) architecture, row column NVM (RC-NVM), has been proposed to support row- and column-oriented addressing (i.e., dual addressing). In this work, we present AttentionRC, which takes advantage of RC-NVM to further improve the efficiency of LSH attention. We first propose an LSH-friendly data mapping strategy that improves memory write and read cycles by 60.9% and 4.9%, respectively. Then, we propose a sort-free RC-aware bucket access and a swap strategy that utilizes dual-addressing to reduce 38% of the data access cycles in attention. Finally, by taking advantage of dual-addressing, we propose transpose-free attention to eliminate the transpose operations that were previously required by the attention, resulting in a 51% reduction in the matrix multiplication time.\",\"PeriodicalId\":13251,\"journal\":{\"name\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"volume\":\"43 11\",\"pages\":\"3925-3936\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10745845/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10745845/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

注意力是 Transformer 架构的重要组成部分,也是其成功的关键因素。然而,随着输入序列长度的增加,它的时间和空间复杂性也呈二次增长。解决这一问题的一种流行方法是 Reformer 模型,该模型使用位置敏感散列(LSH)注意力来降低计算复杂度。LSH 注意将输入序列中的相似标记散列到同一个桶中,并只注意同一个桶中的标记。与此同时,一种新兴的非易失性存储器(NVM)架构--行列式非易失性存储器(RC-NVM)被提出来支持面向行和列的寻址(即双寻址)。在这项工作中,我们提出了 AttentionRC,它利用 RC-NVM 的优势进一步提高了 LSH 注意的效率。我们首先提出了对 LSH 友好的数据映射策略,将内存写入和读取周期分别提高了 60.9% 和 4.9%。然后,我们提出了一种无排序 RC 感知桶访问和交换策略,利用双寻址减少了注意力中 38% 的数据访问周期。最后,利用双寻址的优势,我们提出了无转置注意,消除了以前注意所需的转置操作,从而将矩阵乘法时间减少了 51%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
AttentionRC: A Novel Approach to Improve Locality Sensitive Hashing Attention on Dual-Addressing Memory
Attention is a crucial component of the Transformer architecture and a key factor in its success. However, it suffers from quadratic growth in time and space complexity as input sequence length increases. One popular approach to address this issue is the Reformer model, which uses locality-sensitive hashing (LSH) attention to reduce computational complexity. LSH attention hashes similar tokens in the input sequence to the same bucket and attends tokens only within the same bucket. Meanwhile, a new emerging nonvolatile memory (NVM) architecture, row column NVM (RC-NVM), has been proposed to support row- and column-oriented addressing (i.e., dual addressing). In this work, we present AttentionRC, which takes advantage of RC-NVM to further improve the efficiency of LSH attention. We first propose an LSH-friendly data mapping strategy that improves memory write and read cycles by 60.9% and 4.9%, respectively. Then, we propose a sort-free RC-aware bucket access and a swap strategy that utilizes dual-addressing to reduce 38% of the data access cycles in attention. Finally, by taking advantage of dual-addressing, we propose transpose-free attention to eliminate the transpose operations that were previously required by the attention, resulting in a 51% reduction in the matrix multiplication time.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.60
自引率
13.80%
发文量
500
审稿时长
7 months
期刊介绍: The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信