ISSA: Architecting CNN Accelerators Using Input-Skippable, Set-Associative Computing-in-Memory

IF 3.6 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computers Pub Date : 2024-06-04 DOI:10.1109/TC.2024.3404060

Yun-Chen Lo;Jun-Shen Wu;Chia-Chun Wang;Yu-Chih Tsai;Chih-Chen Yeh;Wen-Chien Ting;Ren-Shuo Liu

{"title":"ISSA: Architecting CNN Accelerators Using Input-Skippable, Set-Associative Computing-in-Memory","authors":"Yun-Chen Lo;Jun-Shen Wu;Chia-Chun Wang;Yu-Chih Tsai;Chih-Chen Yeh;Wen-Chien Ting;Ren-Shuo Liu","doi":"10.1109/TC.2024.3404060","DOIUrl":null,"url":null,"abstract":"Among several emerging architectures, computing in memory (CIM), which features in-situ analog computation, is a potential solution to the data movement bottleneck of the Von Neumann architecture for artificial intelligence (AI). Interestingly, more strengths of CIM significantly different from in-situ analog computation are not widely known yet. In this work, we point out that mutually stationary vectors (MSVs), which can be maximized by introducing associativity to CIM, are another inherent power unique to CIM. By MSVs, CIM exhibits significant freedom to dynamically vectorize the stored data (e.g., weights) to perform agile computation using the dynamically formed vectors. We have designed and realized an SA-CIM silicon prototype and corresponding architecture and acceleration schemes in the TSMC 28 nm process. More specifically, the contributions of this paper are fivefold: 1) We identify MSVs as new features that can be exploited to improve the current performance and energy challenges of the CIM-based hardware. 2) We propose SA-CIM to enhance MSVs (input-reordering flexibility) for skipping the zeros, small values, and sparse vectors. 3) We propose channel swapping to enhance the zero-skipping technique. 4) We propose a transposed systolic dataflow to efficiently conduct conv3\n<inline-formula><tex-math>$\\times$</tex-math></inline-formula>\n3 while being capable of exploiting input-skipping schemes. 5) We propose a design flow to search for optimal aggressive skipping scheme setups while satisfying the accuracy loss constraint. The proposed ISSA architecture improves the throughput by \n<inline-formula><tex-math>$1.91\\times$</tex-math></inline-formula>\n to \n<inline-formula><tex-math>$2.97\\times$</tex-math></inline-formula>\n speedup and the energy efficiency by \n<inline-formula><tex-math>$2.5\\times$</tex-math></inline-formula>\n to \n<inline-formula><tex-math>$4.2\\times$</tex-math></inline-formula>\n.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 9","pages":"2136-2149"},"PeriodicalIF":3.6000,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10547572/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Among several emerging architectures, computing in memory (CIM), which features in-situ analog computation, is a potential solution to the data movement bottleneck of the Von Neumann architecture for artificial intelligence (AI). Interestingly, more strengths of CIM significantly different from in-situ analog computation are not widely known yet. In this work, we point out that mutually stationary vectors (MSVs), which can be maximized by introducing associativity to CIM, are another inherent power unique to CIM. By MSVs, CIM exhibits significant freedom to dynamically vectorize the stored data (e.g., weights) to perform agile computation using the dynamically formed vectors. We have designed and realized an SA-CIM silicon prototype and corresponding architecture and acceleration schemes in the TSMC 28 nm process. More specifically, the contributions of this paper are fivefold: 1) We identify MSVs as new features that can be exploited to improve the current performance and energy challenges of the CIM-based hardware. 2) We propose SA-CIM to enhance MSVs (input-reordering flexibility) for skipping the zeros, small values, and sparse vectors. 3) We propose channel swapping to enhance the zero-skipping technique. 4) We propose a transposed systolic dataflow to efficiently conduct conv3

$\times$

3 while being capable of exploiting input-skipping schemes. 5) We propose a design flow to search for optimal aggressive skipping scheme setups while satisfying the accuracy loss constraint. The proposed ISSA architecture improves the throughput by

$1.91\times$

$2.97\times$

speedup and the energy efficiency by

$2.5\times$

$4.2\times$

查看原文本刊更多论文

ISSA：利用输入可抽取、集合关联的内存计算架构 CNN 加速器

在几种新兴架构中，以原位模拟计算为特点的内存计算（CIM）是人工智能（AI）冯-诺依曼架构数据移动瓶颈的潜在解决方案。有趣的是，CIM 与原位模拟计算明显不同的更多优势尚未广为人知。在这项工作中，我们指出，相互静止向量（MSV）是 CIM 独有的另一项固有优势，它可以通过在 CIM 中引入关联性而实现最大化。通过 MSVs，CIM 在动态矢量化存储数据（如权重）方面展现出极大的自由度，可利用动态形成的矢量执行敏捷计算。我们在台积电 28 纳米工艺中设计并实现了 SA-CIM 硅原型以及相应的架构和加速方案。更具体地说，本文的贡献有五个方面：1）我们确定了 MSV 的新特性，可以利用 MSV 来改善目前基于 CIM 的硬件所面临的性能和能耗挑战。2) 我们提出 SA-CIM 来增强 MSV（输入记录灵活性），以跳过零、小值和稀疏向量。3) 我们提出了信道交换来增强跳零技术。4) 我们提出了一种转置系统数据流，以高效地进行 conv3$\times$3 处理，同时能够利用输入跳转方案。5) 我们提出了一种设计流程，在满足精度损失约束的同时，搜索最佳的激进跳转方案设置。所提出的 ISSA 架构将吞吐量提高了 1.91 美元/次到 2.97 美元/次，能效提高了 2.5 美元/次到 4.2 美元/次。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Computers 工程技术-工程：电子与电气

CiteScore

6.60

自引率

5.40%

发文量

199

审稿时长

6.0 months

期刊介绍： The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.