利用三维集成和非易失性存储器实现密集边缘架构的系统技术协同优化

IF 2.7 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Journal on Exploratory Solid-State Computational Devices and Circuits Pub Date : 2024-11-11 DOI:10.1109/JXCDC.2024.3496118

Leandro M. Giacomini Rocha;Mohamed Naeim;Guilherme Paim;Moritz Brunion;Priya Venugopal;Dragomir Milojevic;James Myers;Mustafa Badaroglu;Marian Verhelst;Julien Ryckaert;Dwaipayan Biswas

{"title":"利用三维集成和非易失性存储器实现密集边缘架构的系统技术协同优化","authors":"Leandro M. Giacomini Rocha;Mohamed Naeim;Guilherme Paim;Moritz Brunion;Priya Venugopal;Dragomir Milojevic;James Myers;Mustafa Badaroglu;Marian Verhelst;Julien Ryckaert;Dwaipayan Biswas","doi":"10.1109/JXCDC.2024.3496118","DOIUrl":null,"url":null,"abstract":"High-performance edge artificial intelligence (Edge-AI) inference applications aim for high energy efficiency, memory density, and small form factor, requiring a design-space exploration across the whole stack—workloads, architecture, mapping, and co-optimization with emerging technology. In this article, we present a system-technology co-optimization (STCO) framework that interfaces with workload-driven system scaling challenges and physical design-enabled technology offerings. The framework is built on three engines that provide the physical design characterization, dataflow mapping optimizer, and system efficiency predictor. The framework builds on a systolic array accelerator to provide the design-technology characterization points using advanced imec A10 nanosheet CMOS node along with emerging, high-density voltage-gated spin-orbit torque (VGSOT) magnetic memories (MRAM), combined with memory-on-logic fine-pitch 3-D wafer-to-wafer hybrid bonding. We observe that the 3-D system integration of static random-access memory (SRAM)-based design leads to 9% power savings with 53% footprint reduction at iso-frequency with respect to 2-D implementation for the same memory capacity. Three-dimensional nonvolatile memory (NVM)-VGSOT allows \n<inline-formula> <tex-math>$4\\times $ </tex-math></inline-formula>\n memory capacity increase with 30% footprint reduction at iso-power compared with 2-D SRAM \n<inline-formula> <tex-math>$1\\times $ </tex-math></inline-formula>\n. Our exploration with two diverse workloads—image resolution enhancement (FSRCNN) and eye tracking (EDSNet)—shows that more resources allow better workload mapping possibilities, which are able to compensate peak system energy efficiency degradation on high memory capacity cases. We show that a 25% peak efficiency reduction on a \n<inline-formula> <tex-math>$32\\times $ </tex-math></inline-formula>\n memory capacity can lead to a \n<inline-formula> <tex-math>$7.4\\times $ </tex-math></inline-formula>\n faster execution with \n<inline-formula> <tex-math>$5.7\\times $ </tex-math></inline-formula>\n higher effective TOPS/W than the \n<inline-formula> <tex-math>$1\\times $ </tex-math></inline-formula>\n memory capacity case on the same technology.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"125-134"},"PeriodicalIF":2.7000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10750212","citationCount":"0","resultStr":"{\"title\":\"System-Technology Co-Optimization for Dense Edge Architectures Using 3-D Integration and Nonvolatile Memory\",\"authors\":\"Leandro M. Giacomini Rocha;Mohamed Naeim;Guilherme Paim;Moritz Brunion;Priya Venugopal;Dragomir Milojevic;James Myers;Mustafa Badaroglu;Marian Verhelst;Julien Ryckaert;Dwaipayan Biswas\",\"doi\":\"10.1109/JXCDC.2024.3496118\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High-performance edge artificial intelligence (Edge-AI) inference applications aim for high energy efficiency, memory density, and small form factor, requiring a design-space exploration across the whole stack—workloads, architecture, mapping, and co-optimization with emerging technology. In this article, we present a system-technology co-optimization (STCO) framework that interfaces with workload-driven system scaling challenges and physical design-enabled technology offerings. The framework is built on three engines that provide the physical design characterization, dataflow mapping optimizer, and system efficiency predictor. The framework builds on a systolic array accelerator to provide the design-technology characterization points using advanced imec A10 nanosheet CMOS node along with emerging, high-density voltage-gated spin-orbit torque (VGSOT) magnetic memories (MRAM), combined with memory-on-logic fine-pitch 3-D wafer-to-wafer hybrid bonding. We observe that the 3-D system integration of static random-access memory (SRAM)-based design leads to 9% power savings with 53% footprint reduction at iso-frequency with respect to 2-D implementation for the same memory capacity. Three-dimensional nonvolatile memory (NVM)-VGSOT allows \\n<inline-formula> <tex-math>$4\\\\times $ </tex-math></inline-formula>\\n memory capacity increase with 30% footprint reduction at iso-power compared with 2-D SRAM \\n<inline-formula> <tex-math>$1\\\\times $ </tex-math></inline-formula>\\n. Our exploration with two diverse workloads—image resolution enhancement (FSRCNN) and eye tracking (EDSNet)—shows that more resources allow better workload mapping possibilities, which are able to compensate peak system energy efficiency degradation on high memory capacity cases. We show that a 25% peak efficiency reduction on a \\n<inline-formula> <tex-math>$32\\\\times $ </tex-math></inline-formula>\\n memory capacity can lead to a \\n<inline-formula> <tex-math>$7.4\\\\times $ </tex-math></inline-formula>\\n faster execution with \\n<inline-formula> <tex-math>$5.7\\\\times $ </tex-math></inline-formula>\\n higher effective TOPS/W than the \\n<inline-formula> <tex-math>$1\\\\times $ </tex-math></inline-formula>\\n memory capacity case on the same technology.\",\"PeriodicalId\":54149,\"journal\":{\"name\":\"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits\",\"volume\":\"10 \",\"pages\":\"125-134\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10750212\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10750212/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10750212/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

高性能边缘人工智能（edge - ai）推理应用旨在实现高能效、内存密度和小尺寸，需要在整个堆栈中进行设计空间探索——工作负载、架构、映射以及与新兴技术的协同优化。在本文中，我们提出了一个系统技术协同优化（STCO）框架，该框架与工作负载驱动的系统扩展挑战和支持物理设计的技术产品相结合。该框架建立在三个引擎上，它们提供物理设计特性、数据流映射优化器和系统效率预测器。该框架建立在收缩阵列加速器的基础上，利用先进的imec A10纳米片CMOS节点，以及新兴的高密度电压门控自旋轨道扭矩（VGSOT）磁存储器（MRAM），结合存储逻辑上的小间距3d晶圆间混合键合，提供设计技术表征点。我们观察到，基于静态随机存取存储器（SRAM）的3-D系统集成设计在相同内存容量的情况下，相对于2-D实现，可在等频下节省9%的功耗，减少53%的占用空间。三维非易失性存储器(NVM)-VGSOT与2d SRAM相比，在同等功耗下，内存容量增加了4倍，占用空间减少了30%。我们对两种不同工作负载——图像分辨率增强（FSRCNN）和眼动追踪（EDSNet）——的探索表明，更多的资源允许更好的工作负载映射可能性，这能够补偿在高内存容量情况下的峰值系统能效下降。我们表明，与相同技术上的1倍内存容量相比，在32倍内存容量上降低25%的峰值效率可以使执行速度提高7.4倍，有效TOPS/W提高5.7倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

System-Technology Co-Optimization for Dense Edge Architectures Using 3-D Integration and Nonvolatile Memory

High-performance edge artificial intelligence (Edge-AI) inference applications aim for high energy efficiency, memory density, and small form factor, requiring a design-space exploration across the whole stack—workloads, architecture, mapping, and co-optimization with emerging technology. In this article, we present a system-technology co-optimization (STCO) framework that interfaces with workload-driven system scaling challenges and physical design-enabled technology offerings. The framework is built on three engines that provide the physical design characterization, dataflow mapping optimizer, and system efficiency predictor. The framework builds on a systolic array accelerator to provide the design-technology characterization points using advanced imec A10 nanosheet CMOS node along with emerging, high-density voltage-gated spin-orbit torque (VGSOT) magnetic memories (MRAM), combined with memory-on-logic fine-pitch 3-D wafer-to-wafer hybrid bonding. We observe that the 3-D system integration of static random-access memory (SRAM)-based design leads to 9% power savings with 53% footprint reduction at iso-frequency with respect to 2-D implementation for the same memory capacity. Three-dimensional nonvolatile memory (NVM)-VGSOT allows

$4\times $

memory capacity increase with 30% footprint reduction at iso-power compared with 2-D SRAM

$1\times $

. Our exploration with two diverse workloads—image resolution enhancement (FSRCNN) and eye tracking (EDSNet)—shows that more resources allow better workload mapping possibilities, which are able to compensate peak system energy efficiency degradation on high memory capacity cases. We show that a 25% peak efficiency reduction on a

$32\times $

memory capacity can lead to a

$7.4\times $

faster execution with

$5.7\times $

higher effective TOPS/W than the

$1\times $

memory capacity case on the same technology.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊