Yun-Chen Lo;Jun-Shen Wu;Chia-Chun Wang;Yu-Chih Tsai;Chih-Chen Yeh;Wen-Chien Ting;Ren-Shuo Liu
{"title":"ISSA: Architecting CNN Accelerators Using Input-Skippable, Set-Associative Computing-in-Memory","authors":"Yun-Chen Lo;Jun-Shen Wu;Chia-Chun Wang;Yu-Chih Tsai;Chih-Chen Yeh;Wen-Chien Ting;Ren-Shuo Liu","doi":"10.1109/TC.2024.3404060","DOIUrl":null,"url":null,"abstract":"Among several emerging architectures, computing in memory (CIM), which features in-situ analog computation, is a potential solution to the data movement bottleneck of the Von Neumann architecture for artificial intelligence (AI). Interestingly, more strengths of CIM significantly different from in-situ analog computation are not widely known yet. In this work, we point out that mutually stationary vectors (MSVs), which can be maximized by introducing associativity to CIM, are another inherent power unique to CIM. By MSVs, CIM exhibits significant freedom to dynamically vectorize the stored data (e.g., weights) to perform agile computation using the dynamically formed vectors. We have designed and realized an SA-CIM silicon prototype and corresponding architecture and acceleration schemes in the TSMC 28 nm process. More specifically, the contributions of this paper are fivefold: 1) We identify MSVs as new features that can be exploited to improve the current performance and energy challenges of the CIM-based hardware. 2) We propose SA-CIM to enhance MSVs (input-reordering flexibility) for skipping the zeros, small values, and sparse vectors. 3) We propose channel swapping to enhance the zero-skipping technique. 4) We propose a transposed systolic dataflow to efficiently conduct conv3\n<inline-formula><tex-math>$\\times$</tex-math></inline-formula>\n3 while being capable of exploiting input-skipping schemes. 5) We propose a design flow to search for optimal aggressive skipping scheme setups while satisfying the accuracy loss constraint. The proposed ISSA architecture improves the throughput by \n<inline-formula><tex-math>$1.91\\times$</tex-math></inline-formula>\n to \n<inline-formula><tex-math>$2.97\\times$</tex-math></inline-formula>\n speedup and the energy efficiency by \n<inline-formula><tex-math>$2.5\\times$</tex-math></inline-formula>\n to \n<inline-formula><tex-math>$4.2\\times$</tex-math></inline-formula>\n.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 9","pages":"2136-2149"},"PeriodicalIF":3.6000,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10547572/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Among several emerging architectures, computing in memory (CIM), which features in-situ analog computation, is a potential solution to the data movement bottleneck of the Von Neumann architecture for artificial intelligence (AI). Interestingly, more strengths of CIM significantly different from in-situ analog computation are not widely known yet. In this work, we point out that mutually stationary vectors (MSVs), which can be maximized by introducing associativity to CIM, are another inherent power unique to CIM. By MSVs, CIM exhibits significant freedom to dynamically vectorize the stored data (e.g., weights) to perform agile computation using the dynamically formed vectors. We have designed and realized an SA-CIM silicon prototype and corresponding architecture and acceleration schemes in the TSMC 28 nm process. More specifically, the contributions of this paper are fivefold: 1) We identify MSVs as new features that can be exploited to improve the current performance and energy challenges of the CIM-based hardware. 2) We propose SA-CIM to enhance MSVs (input-reordering flexibility) for skipping the zeros, small values, and sparse vectors. 3) We propose channel swapping to enhance the zero-skipping technique. 4) We propose a transposed systolic dataflow to efficiently conduct conv3
$\times$
3 while being capable of exploiting input-skipping schemes. 5) We propose a design flow to search for optimal aggressive skipping scheme setups while satisfying the accuracy loss constraint. The proposed ISSA architecture improves the throughput by
$1.91\times$
to
$2.97\times$
speedup and the energy efficiency by
$2.5\times$
to
$4.2\times$
.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.