ArXrCiM: Architectural Exploration of Application-Specific Resonant SRAM Compute-in-Memory

IF 2.8 2区 工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Dhandeep Challagundla;Ignatius Bezzam;Riadul Islam
{"title":"ArXrCiM: Architectural Exploration of Application-Specific Resonant SRAM Compute-in-Memory","authors":"Dhandeep Challagundla;Ignatius Bezzam;Riadul Islam","doi":"10.1109/TVLSI.2024.3502359","DOIUrl":null,"url":null,"abstract":"While general-purpose computing follows von Neumann’s architecture, the data movement between memory and processor elements dictates the processor’s performance. The evolving compute-in-memory (CiM) paradigm tackles this issue by facilitating simultaneous processing and storage within static random-access memory (SRAM) elements. Numerous design decisions taken at different levels of hierarchy affect the figures of merit (FoMs) of SRAM, such as power, performance, area, and yield. The absence of a rapid assessment mechanism for the impact of changes at different hierarchy levels on global FoMs poses a challenge to accurately evaluating innovative SRAM designs. This article presents an automation tool designed to optimize the energy and latency of SRAM designs incorporating diverse implementation strategies for executing logic operations within the SRAM. The tool structure allows easy comparison across different array topologies and various design strategies to result in energy-efficient implementations. Our study involves a comprehensive comparison of over 6900+ distinct design implementation strategies for École Polytechnique Fédérale de Lausanne (EPFL) combinational benchmark circuits on the energy-recycling resonant CiM (rCiM) architecture designed using Taiwan Semiconductor Manufacturing Company (TSMC) 28-nm technology. When provided with a combinational circuit, the tool aims to generate an energy-efficient implementation strategy tailored to the specified input memory and latency constraints. The tool reduces 80.9% of energy consumption on average across all benchmarks while using the six-topology implementation compared with the baseline implementation of single-macro topology by considering the parallel processing capability of rCiM cache size ranging from 4 to 192 kB.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"179-192"},"PeriodicalIF":2.8000,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10767429/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

While general-purpose computing follows von Neumann’s architecture, the data movement between memory and processor elements dictates the processor’s performance. The evolving compute-in-memory (CiM) paradigm tackles this issue by facilitating simultaneous processing and storage within static random-access memory (SRAM) elements. Numerous design decisions taken at different levels of hierarchy affect the figures of merit (FoMs) of SRAM, such as power, performance, area, and yield. The absence of a rapid assessment mechanism for the impact of changes at different hierarchy levels on global FoMs poses a challenge to accurately evaluating innovative SRAM designs. This article presents an automation tool designed to optimize the energy and latency of SRAM designs incorporating diverse implementation strategies for executing logic operations within the SRAM. The tool structure allows easy comparison across different array topologies and various design strategies to result in energy-efficient implementations. Our study involves a comprehensive comparison of over 6900+ distinct design implementation strategies for École Polytechnique Fédérale de Lausanne (EPFL) combinational benchmark circuits on the energy-recycling resonant CiM (rCiM) architecture designed using Taiwan Semiconductor Manufacturing Company (TSMC) 28-nm technology. When provided with a combinational circuit, the tool aims to generate an energy-efficient implementation strategy tailored to the specified input memory and latency constraints. The tool reduces 80.9% of energy consumption on average across all benchmarks while using the six-topology implementation compared with the baseline implementation of single-macro topology by considering the parallel processing capability of rCiM cache size ranging from 4 to 192 kB.
ArXrCiM:特定应用谐振式SRAM内存计算的架构探索
虽然通用计算遵循冯·诺伊曼的架构,但内存和处理器元素之间的数据移动决定了处理器的性能。不断发展的内存计算(CiM)范例通过促进静态随机存取存储器(SRAM)元素内的同步处理和存储来解决这个问题。在不同层次上所做的许多设计决策会影响SRAM的性能指标(FoMs),如功率、性能、面积和良率。缺乏一种快速评估机制来评估不同层次变化对全球FoMs的影响,这对准确评估创新SRAM设计提出了挑战。本文提出了一种自动化工具,旨在优化SRAM设计的能量和延迟,该设计结合了在SRAM内执行逻辑操作的多种实现策略。工具结构允许轻松比较不同的阵列拓扑和各种设计策略,从而实现节能。我们的研究涉及对使用台湾半导体制造公司(TSMC) 28纳米技术设计的能量回收谐振CiM (rCiM)架构上的École Polytechnique fsamdsamrale de Lausanne (EPFL)组合基准电路的6900多种不同设计实现策略的全面比较。当提供组合电路时,该工具旨在生成针对指定输入存储器和延迟限制的节能实现策略。通过考虑rCiM缓存大小范围从4到192 kB的并行处理能力,与单宏拓扑的基线实现相比,该工具在使用六拓扑实现时,在所有基准测试中平均减少了80.9%的能耗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.40
自引率
7.10%
发文量
187
审稿时长
3.6 months
期刊介绍: The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society. Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels. To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信