{"title":"AttentionRC: A Novel Approach to Improve Locality Sensitive Hashing Attention on Dual-Addressing Memory","authors":"Chun-Lin Chu;Yun-Chih Chen;Wei Cheng;Ing-Chao Lin;Yuan-Hao Chang","doi":"10.1109/TCAD.2024.3447217","DOIUrl":null,"url":null,"abstract":"Attention is a crucial component of the Transformer architecture and a key factor in its success. However, it suffers from quadratic growth in time and space complexity as input sequence length increases. One popular approach to address this issue is the Reformer model, which uses locality-sensitive hashing (LSH) attention to reduce computational complexity. LSH attention hashes similar tokens in the input sequence to the same bucket and attends tokens only within the same bucket. Meanwhile, a new emerging nonvolatile memory (NVM) architecture, row column NVM (RC-NVM), has been proposed to support row- and column-oriented addressing (i.e., dual addressing). In this work, we present AttentionRC, which takes advantage of RC-NVM to further improve the efficiency of LSH attention. We first propose an LSH-friendly data mapping strategy that improves memory write and read cycles by 60.9% and 4.9%, respectively. Then, we propose a sort-free RC-aware bucket access and a swap strategy that utilizes dual-addressing to reduce 38% of the data access cycles in attention. Finally, by taking advantage of dual-addressing, we propose transpose-free attention to eliminate the transpose operations that were previously required by the attention, resulting in a 51% reduction in the matrix multiplication time.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3925-3936"},"PeriodicalIF":2.7000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10745845/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Attention is a crucial component of the Transformer architecture and a key factor in its success. However, it suffers from quadratic growth in time and space complexity as input sequence length increases. One popular approach to address this issue is the Reformer model, which uses locality-sensitive hashing (LSH) attention to reduce computational complexity. LSH attention hashes similar tokens in the input sequence to the same bucket and attends tokens only within the same bucket. Meanwhile, a new emerging nonvolatile memory (NVM) architecture, row column NVM (RC-NVM), has been proposed to support row- and column-oriented addressing (i.e., dual addressing). In this work, we present AttentionRC, which takes advantage of RC-NVM to further improve the efficiency of LSH attention. We first propose an LSH-friendly data mapping strategy that improves memory write and read cycles by 60.9% and 4.9%, respectively. Then, we propose a sort-free RC-aware bucket access and a swap strategy that utilizes dual-addressing to reduce 38% of the data access cycles in attention. Finally, by taking advantage of dual-addressing, we propose transpose-free attention to eliminate the transpose operations that were previously required by the attention, resulting in a 51% reduction in the matrix multiplication time.
期刊介绍:
The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.