System-Technology Co-Optimization for Dense Edge Architectures Using 3-D Integration and Nonvolatile Memory

IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Leandro M. Giacomini Rocha;Mohamed Naeim;Guilherme Paim;Moritz Brunion;Priya Venugopal;Dragomir Milojevic;James Myers;Mustafa Badaroglu;Marian Verhelst;Julien Ryckaert;Dwaipayan Biswas
{"title":"System-Technology Co-Optimization for Dense Edge Architectures Using 3-D Integration and Nonvolatile Memory","authors":"Leandro M. Giacomini Rocha;Mohamed Naeim;Guilherme Paim;Moritz Brunion;Priya Venugopal;Dragomir Milojevic;James Myers;Mustafa Badaroglu;Marian Verhelst;Julien Ryckaert;Dwaipayan Biswas","doi":"10.1109/JXCDC.2024.3496118","DOIUrl":null,"url":null,"abstract":"High-performance edge artificial intelligence (Edge-AI) inference applications aim for high energy efficiency, memory density, and small form factor, requiring a design-space exploration across the whole stack—workloads, architecture, mapping, and co-optimization with emerging technology. In this article, we present a system-technology co-optimization (STCO) framework that interfaces with workload-driven system scaling challenges and physical design-enabled technology offerings. The framework is built on three engines that provide the physical design characterization, dataflow mapping optimizer, and system efficiency predictor. The framework builds on a systolic array accelerator to provide the design-technology characterization points using advanced imec A10 nanosheet CMOS node along with emerging, high-density voltage-gated spin-orbit torque (VGSOT) magnetic memories (MRAM), combined with memory-on-logic fine-pitch 3-D wafer-to-wafer hybrid bonding. We observe that the 3-D system integration of static random-access memory (SRAM)-based design leads to 9% power savings with 53% footprint reduction at iso-frequency with respect to 2-D implementation for the same memory capacity. Three-dimensional nonvolatile memory (NVM)-VGSOT allows \n<inline-formula> <tex-math>$4\\times $ </tex-math></inline-formula>\n memory capacity increase with 30% footprint reduction at iso-power compared with 2-D SRAM \n<inline-formula> <tex-math>$1\\times $ </tex-math></inline-formula>\n. Our exploration with two diverse workloads—image resolution enhancement (FSRCNN) and eye tracking (EDSNet)—shows that more resources allow better workload mapping possibilities, which are able to compensate peak system energy efficiency degradation on high memory capacity cases. We show that a 25% peak efficiency reduction on a \n<inline-formula> <tex-math>$32\\times $ </tex-math></inline-formula>\n memory capacity can lead to a \n<inline-formula> <tex-math>$7.4\\times $ </tex-math></inline-formula>\n faster execution with \n<inline-formula> <tex-math>$5.7\\times $ </tex-math></inline-formula>\n higher effective TOPS/W than the \n<inline-formula> <tex-math>$1\\times $ </tex-math></inline-formula>\n memory capacity case on the same technology.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"125-134"},"PeriodicalIF":2.0000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10750212","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10750212/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

High-performance edge artificial intelligence (Edge-AI) inference applications aim for high energy efficiency, memory density, and small form factor, requiring a design-space exploration across the whole stack—workloads, architecture, mapping, and co-optimization with emerging technology. In this article, we present a system-technology co-optimization (STCO) framework that interfaces with workload-driven system scaling challenges and physical design-enabled technology offerings. The framework is built on three engines that provide the physical design characterization, dataflow mapping optimizer, and system efficiency predictor. The framework builds on a systolic array accelerator to provide the design-technology characterization points using advanced imec A10 nanosheet CMOS node along with emerging, high-density voltage-gated spin-orbit torque (VGSOT) magnetic memories (MRAM), combined with memory-on-logic fine-pitch 3-D wafer-to-wafer hybrid bonding. We observe that the 3-D system integration of static random-access memory (SRAM)-based design leads to 9% power savings with 53% footprint reduction at iso-frequency with respect to 2-D implementation for the same memory capacity. Three-dimensional nonvolatile memory (NVM)-VGSOT allows $4\times $ memory capacity increase with 30% footprint reduction at iso-power compared with 2-D SRAM $1\times $ . Our exploration with two diverse workloads—image resolution enhancement (FSRCNN) and eye tracking (EDSNet)—shows that more resources allow better workload mapping possibilities, which are able to compensate peak system energy efficiency degradation on high memory capacity cases. We show that a 25% peak efficiency reduction on a $32\times $ memory capacity can lead to a $7.4\times $ faster execution with $5.7\times $ higher effective TOPS/W than the $1\times $ memory capacity case on the same technology.
利用三维集成和非易失性存储器实现密集边缘架构的系统技术协同优化
高性能边缘人工智能(edge - ai)推理应用旨在实现高能效、内存密度和小尺寸,需要在整个堆栈中进行设计空间探索——工作负载、架构、映射以及与新兴技术的协同优化。在本文中,我们提出了一个系统技术协同优化(STCO)框架,该框架与工作负载驱动的系统扩展挑战和支持物理设计的技术产品相结合。该框架建立在三个引擎上,它们提供物理设计特性、数据流映射优化器和系统效率预测器。该框架建立在收缩阵列加速器的基础上,利用先进的imec A10纳米片CMOS节点,以及新兴的高密度电压门控自旋轨道扭矩(VGSOT)磁存储器(MRAM),结合存储逻辑上的小间距3d晶圆间混合键合,提供设计技术表征点。我们观察到,基于静态随机存取存储器(SRAM)的3-D系统集成设计在相同内存容量的情况下,相对于2-D实现,可在等频下节省9%的功耗,减少53%的占用空间。三维非易失性存储器(NVM)-VGSOT与2d SRAM相比,在同等功耗下,内存容量增加了4倍,占用空间减少了30%。我们对两种不同工作负载——图像分辨率增强(FSRCNN)和眼动追踪(EDSNet)——的探索表明,更多的资源允许更好的工作负载映射可能性,这能够补偿在高内存容量情况下的峰值系统能效下降。我们表明,与相同技术上的1倍内存容量相比,在32倍内存容量上降低25%的峰值效率可以使执行速度提高7.4倍,有效TOPS/W提高5.7倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.00
自引率
4.20%
发文量
11
审稿时长
13 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信