Random Projection-Based Locality-Sensitive Hashing in a Memristor Crossbar Array with Stochasticity for Sparse Self-Attention-Based Transformer

IF 5.3 2区材料科学 Q2 MATERIALS SCIENCE, MULTIDISCIPLINARY

Advanced Electronic Materials Pub Date : 2024-06-17 DOI:10.1002/aelm.202300850

Xinxin Wang, Ilia Valov, Huanglong Li

{"title":"Random Projection-Based Locality-Sensitive Hashing in a Memristor Crossbar Array with Stochasticity for Sparse Self-Attention-Based Transformer","authors":"Xinxin Wang, Ilia Valov, Huanglong Li","doi":"10.1002/aelm.202300850","DOIUrl":null,"url":null,"abstract":"<p>Self-attention mechanism is critically central to the state-of-the-art transformer models. Because the standard full self-attention has quadratic complexity with respect to the input's length L, resulting in prohibitively large memory for very long sequences, sparse self-attention enabled by random projection (RP)-based locality-sensitive hashing (LSH) has recently been proposed to reduce the complexity to O(L log L). However, in current digital computing hardware with a von Neumann architecture, RP, which is essentially a matrix multiplication operation, incurs unavoidable time and energy-consuming data shuttling between off-chip memory and processing units. In addition, it is known that digital computers simply cannot generate provably random numbers. With the emerging analog memristive technology, it is shown that it is feasible to harness the intrinsic device-to-device variability in the memristor crossbar array for implementing the RP matrix and perform RP-LSH computation in memory. On this basis, sequence prediction tasks are performed with a sparse self-attention-based Transformer in a hybrid software-hardware approach, achieving a testing accuracy over 70% with much less computational complexity. By further harnessing the cycle-to-cycle variability for multi-round hashing, 12% increase in the testing accuracy is demonstrated. This work extends the range of applications of memristor crossbar arrays to the state-of-the-art large language models (LLMs).</p>","PeriodicalId":110,"journal":{"name":"Advanced Electronic Materials","volume":"10 10","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aelm.202300850","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Electronic Materials","FirstCategoryId":"88","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/aelm.202300850","RegionNum":2,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Self-attention mechanism is critically central to the state-of-the-art transformer models. Because the standard full self-attention has quadratic complexity with respect to the input's length L, resulting in prohibitively large memory for very long sequences, sparse self-attention enabled by random projection (RP)-based locality-sensitive hashing (LSH) has recently been proposed to reduce the complexity to O(L log L). However, in current digital computing hardware with a von Neumann architecture, RP, which is essentially a matrix multiplication operation, incurs unavoidable time and energy-consuming data shuttling between off-chip memory and processing units. In addition, it is known that digital computers simply cannot generate provably random numbers. With the emerging analog memristive technology, it is shown that it is feasible to harness the intrinsic device-to-device variability in the memristor crossbar array for implementing the RP matrix and perform RP-LSH computation in memory. On this basis, sequence prediction tasks are performed with a sparse self-attention-based Transformer in a hybrid software-hardware approach, achieving a testing accuracy over 70% with much less computational complexity. By further harnessing the cycle-to-cycle variability for multi-round hashing, 12% increase in the testing accuracy is demonstrated. This work extends the range of applications of memristor crossbar arrays to the state-of-the-art large language models (LLMs).

Abstract Image

查看原文本刊更多论文

基于随机性的忆阻器交叉棒阵列中位置敏感的随机投影散列，用于稀疏自注意变压器

自注意机制是最先进变压器模型的关键核心。由于标准的完全自注意具有相对于输入长度 L 的二次复杂性，导致超长序列的内存过大，因此最近提出了基于随机投影（RP）的位置敏感散列（LSH）的稀疏自注意，以将复杂性降低到 O(L log L)。然而，在目前采用冯-诺依曼架构的数字计算硬件中，RP 本质上是一种矩阵乘法运算，在片外内存和处理单元之间进行数据穿梭需要耗费大量时间和精力，这是不可避免的。此外，众所周知，数字计算机根本无法生成可证明的随机数。新兴的模拟忆阻器技术表明，利用忆阻器交叉棒阵列中器件与器件之间的内在可变性来实现 RP 矩阵并在存储器中执行 RP-LSH 计算是可行的。在此基础上，通过软硬件混合方法，利用基于稀疏自注意的Transformer执行序列预测任务，以更低的计算复杂度实现了超过70%的测试精度。通过进一步利用多轮散列的周期间可变性，测试精度提高了 12%。这项工作扩展了忆阻器横杆阵列在最先进的大型语言模型（LLM）中的应用范围。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Advanced Electronic Materials NANOSCIENCE & NANOTECHNOLOGYMATERIALS SCIE-MATERIALS SCIENCE, MULTIDISCIPLINARY

CiteScore

11.00

自引率

3.20%

发文量

433

期刊介绍： Advanced Electronic Materials is an interdisciplinary forum for peer-reviewed, high-quality, high-impact research in the fields of materials science, physics, and engineering of electronic and magnetic materials. It includes research on physics and physical properties of electronic and magnetic materials, spintronics, electronics, device physics and engineering, micro- and nano-electromechanical systems, and organic electronics, in addition to fundamental research.