Random Projection-Based Locality-Sensitive Hashing in a Memristor Crossbar Array with Stochasticity for Sparse Self-Attention-Based Transformer

IF 5.3 2区 材料科学 Q2 MATERIALS SCIENCE, MULTIDISCIPLINARY
Xinxin Wang, Ilia Valov, Huanglong Li
{"title":"Random Projection-Based Locality-Sensitive Hashing in a Memristor Crossbar Array with Stochasticity for Sparse Self-Attention-Based Transformer","authors":"Xinxin Wang,&nbsp;Ilia Valov,&nbsp;Huanglong Li","doi":"10.1002/aelm.202300850","DOIUrl":null,"url":null,"abstract":"<p>Self-attention mechanism is critically central to the state-of-the-art transformer models. Because the standard full self-attention has quadratic complexity with respect to the input's length L, resulting in prohibitively large memory for very long sequences, sparse self-attention enabled by random projection (RP)-based locality-sensitive hashing (LSH) has recently been proposed to reduce the complexity to O(L log L). However, in current digital computing hardware with a von Neumann architecture, RP, which is essentially a matrix multiplication operation, incurs unavoidable time and energy-consuming data shuttling between off-chip memory and processing units. In addition, it is known that digital computers simply cannot generate provably random numbers. With the emerging analog memristive technology, it is shown that it is feasible to harness the intrinsic device-to-device variability in the memristor crossbar array for implementing the RP matrix and perform RP-LSH computation in memory. On this basis, sequence prediction tasks are performed with a sparse self-attention-based Transformer in a hybrid software-hardware approach, achieving a testing accuracy over 70% with much less computational complexity. By further harnessing the cycle-to-cycle variability for multi-round hashing, 12% increase in the testing accuracy is demonstrated. This work extends the range of applications of memristor crossbar arrays to the state-of-the-art large language models (LLMs).</p>","PeriodicalId":110,"journal":{"name":"Advanced Electronic Materials","volume":"10 10","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aelm.202300850","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Electronic Materials","FirstCategoryId":"88","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/aelm.202300850","RegionNum":2,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Self-attention mechanism is critically central to the state-of-the-art transformer models. Because the standard full self-attention has quadratic complexity with respect to the input's length L, resulting in prohibitively large memory for very long sequences, sparse self-attention enabled by random projection (RP)-based locality-sensitive hashing (LSH) has recently been proposed to reduce the complexity to O(L log L). However, in current digital computing hardware with a von Neumann architecture, RP, which is essentially a matrix multiplication operation, incurs unavoidable time and energy-consuming data shuttling between off-chip memory and processing units. In addition, it is known that digital computers simply cannot generate provably random numbers. With the emerging analog memristive technology, it is shown that it is feasible to harness the intrinsic device-to-device variability in the memristor crossbar array for implementing the RP matrix and perform RP-LSH computation in memory. On this basis, sequence prediction tasks are performed with a sparse self-attention-based Transformer in a hybrid software-hardware approach, achieving a testing accuracy over 70% with much less computational complexity. By further harnessing the cycle-to-cycle variability for multi-round hashing, 12% increase in the testing accuracy is demonstrated. This work extends the range of applications of memristor crossbar arrays to the state-of-the-art large language models (LLMs).

Abstract Image

Abstract Image

基于随机性的忆阻器交叉棒阵列中位置敏感的随机投影散列,用于稀疏自注意变压器
自注意机制是最先进变压器模型的关键核心。由于标准的完全自注意具有相对于输入长度 L 的二次复杂性,导致超长序列的内存过大,因此最近提出了基于随机投影(RP)的位置敏感散列(LSH)的稀疏自注意,以将复杂性降低到 O(L log L)。然而,在目前采用冯-诺依曼架构的数字计算硬件中,RP 本质上是一种矩阵乘法运算,在片外内存和处理单元之间进行数据穿梭需要耗费大量时间和精力,这是不可避免的。此外,众所周知,数字计算机根本无法生成可证明的随机数。新兴的模拟忆阻器技术表明,利用忆阻器交叉棒阵列中器件与器件之间的内在可变性来实现 RP 矩阵并在存储器中执行 RP-LSH 计算是可行的。在此基础上,通过软硬件混合方法,利用基于稀疏自注意的Transformer执行序列预测任务,以更低的计算复杂度实现了超过70%的测试精度。通过进一步利用多轮散列的周期间可变性,测试精度提高了 12%。这项工作扩展了忆阻器横杆阵列在最先进的大型语言模型(LLM)中的应用范围。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Advanced Electronic Materials
Advanced Electronic Materials NANOSCIENCE & NANOTECHNOLOGYMATERIALS SCIE-MATERIALS SCIENCE, MULTIDISCIPLINARY
CiteScore
11.00
自引率
3.20%
发文量
433
期刊介绍: Advanced Electronic Materials is an interdisciplinary forum for peer-reviewed, high-quality, high-impact research in the fields of materials science, physics, and engineering of electronic and magnetic materials. It includes research on physics and physical properties of electronic and magnetic materials, spintronics, electronics, device physics and engineering, micro- and nano-electromechanical systems, and organic electronics, in addition to fundamental research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信