利用三维集成和非易失性存储器实现密集边缘架构的系统技术协同优化

IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Leandro M. Giacomini Rocha;Mohamed Naeim;Guilherme Paim;Moritz Brunion;Priya Venugopal;Dragomir Milojevic;James Myers;Mustafa Badaroglu;Marian Verhelst;Julien Ryckaert;Dwaipayan Biswas
{"title":"利用三维集成和非易失性存储器实现密集边缘架构的系统技术协同优化","authors":"Leandro M. Giacomini Rocha;Mohamed Naeim;Guilherme Paim;Moritz Brunion;Priya Venugopal;Dragomir Milojevic;James Myers;Mustafa Badaroglu;Marian Verhelst;Julien Ryckaert;Dwaipayan Biswas","doi":"10.1109/JXCDC.2024.3496118","DOIUrl":null,"url":null,"abstract":"High-performance edge artificial intelligence (Edge-AI) inference applications aim for high energy efficiency, memory density, and small form factor, requiring a design-space exploration across the whole stack—workloads, architecture, mapping, and co-optimization with emerging technology. In this article, we present a system-technology co-optimization (STCO) framework that interfaces with workload-driven system scaling challenges and physical design-enabled technology offerings. The framework is built on three engines that provide the physical design characterization, dataflow mapping optimizer, and system efficiency predictor. The framework builds on a systolic array accelerator to provide the design-technology characterization points using advanced imec A10 nanosheet CMOS node along with emerging, high-density voltage-gated spin-orbit torque (VGSOT) magnetic memories (MRAM), combined with memory-on-logic fine-pitch 3-D wafer-to-wafer hybrid bonding. We observe that the 3-D system integration of static random-access memory (SRAM)-based design leads to 9% power savings with 53% footprint reduction at iso-frequency with respect to 2-D implementation for the same memory capacity. Three-dimensional nonvolatile memory (NVM)-VGSOT allows \n<inline-formula> <tex-math>$4\\times $ </tex-math></inline-formula>\n memory capacity increase with 30% footprint reduction at iso-power compared with 2-D SRAM \n<inline-formula> <tex-math>$1\\times $ </tex-math></inline-formula>\n. Our exploration with two diverse workloads—image resolution enhancement (FSRCNN) and eye tracking (EDSNet)—shows that more resources allow better workload mapping possibilities, which are able to compensate peak system energy efficiency degradation on high memory capacity cases. We show that a 25% peak efficiency reduction on a \n<inline-formula> <tex-math>$32\\times $ </tex-math></inline-formula>\n memory capacity can lead to a \n<inline-formula> <tex-math>$7.4\\times $ </tex-math></inline-formula>\n faster execution with \n<inline-formula> <tex-math>$5.7\\times $ </tex-math></inline-formula>\n higher effective TOPS/W than the \n<inline-formula> <tex-math>$1\\times $ </tex-math></inline-formula>\n memory capacity case on the same technology.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"125-134"},"PeriodicalIF":2.0000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10750212","citationCount":"0","resultStr":"{\"title\":\"System-Technology Co-Optimization for Dense Edge Architectures Using 3-D Integration and Nonvolatile Memory\",\"authors\":\"Leandro M. Giacomini Rocha;Mohamed Naeim;Guilherme Paim;Moritz Brunion;Priya Venugopal;Dragomir Milojevic;James Myers;Mustafa Badaroglu;Marian Verhelst;Julien Ryckaert;Dwaipayan Biswas\",\"doi\":\"10.1109/JXCDC.2024.3496118\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High-performance edge artificial intelligence (Edge-AI) inference applications aim for high energy efficiency, memory density, and small form factor, requiring a design-space exploration across the whole stack—workloads, architecture, mapping, and co-optimization with emerging technology. In this article, we present a system-technology co-optimization (STCO) framework that interfaces with workload-driven system scaling challenges and physical design-enabled technology offerings. The framework is built on three engines that provide the physical design characterization, dataflow mapping optimizer, and system efficiency predictor. The framework builds on a systolic array accelerator to provide the design-technology characterization points using advanced imec A10 nanosheet CMOS node along with emerging, high-density voltage-gated spin-orbit torque (VGSOT) magnetic memories (MRAM), combined with memory-on-logic fine-pitch 3-D wafer-to-wafer hybrid bonding. We observe that the 3-D system integration of static random-access memory (SRAM)-based design leads to 9% power savings with 53% footprint reduction at iso-frequency with respect to 2-D implementation for the same memory capacity. Three-dimensional nonvolatile memory (NVM)-VGSOT allows \\n<inline-formula> <tex-math>$4\\\\times $ </tex-math></inline-formula>\\n memory capacity increase with 30% footprint reduction at iso-power compared with 2-D SRAM \\n<inline-formula> <tex-math>$1\\\\times $ </tex-math></inline-formula>\\n. Our exploration with two diverse workloads—image resolution enhancement (FSRCNN) and eye tracking (EDSNet)—shows that more resources allow better workload mapping possibilities, which are able to compensate peak system energy efficiency degradation on high memory capacity cases. We show that a 25% peak efficiency reduction on a \\n<inline-formula> <tex-math>$32\\\\times $ </tex-math></inline-formula>\\n memory capacity can lead to a \\n<inline-formula> <tex-math>$7.4\\\\times $ </tex-math></inline-formula>\\n faster execution with \\n<inline-formula> <tex-math>$5.7\\\\times $ </tex-math></inline-formula>\\n higher effective TOPS/W than the \\n<inline-formula> <tex-math>$1\\\\times $ </tex-math></inline-formula>\\n memory capacity case on the same technology.\",\"PeriodicalId\":54149,\"journal\":{\"name\":\"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits\",\"volume\":\"10 \",\"pages\":\"125-134\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2024-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10750212\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10750212/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10750212/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

本文章由计算机程序翻译,如有差异,请以英文原文为准。
System-Technology Co-Optimization for Dense Edge Architectures Using 3-D Integration and Nonvolatile Memory
High-performance edge artificial intelligence (Edge-AI) inference applications aim for high energy efficiency, memory density, and small form factor, requiring a design-space exploration across the whole stack—workloads, architecture, mapping, and co-optimization with emerging technology. In this article, we present a system-technology co-optimization (STCO) framework that interfaces with workload-driven system scaling challenges and physical design-enabled technology offerings. The framework is built on three engines that provide the physical design characterization, dataflow mapping optimizer, and system efficiency predictor. The framework builds on a systolic array accelerator to provide the design-technology characterization points using advanced imec A10 nanosheet CMOS node along with emerging, high-density voltage-gated spin-orbit torque (VGSOT) magnetic memories (MRAM), combined with memory-on-logic fine-pitch 3-D wafer-to-wafer hybrid bonding. We observe that the 3-D system integration of static random-access memory (SRAM)-based design leads to 9% power savings with 53% footprint reduction at iso-frequency with respect to 2-D implementation for the same memory capacity. Three-dimensional nonvolatile memory (NVM)-VGSOT allows $4\times $ memory capacity increase with 30% footprint reduction at iso-power compared with 2-D SRAM $1\times $ . Our exploration with two diverse workloads—image resolution enhancement (FSRCNN) and eye tracking (EDSNet)—shows that more resources allow better workload mapping possibilities, which are able to compensate peak system energy efficiency degradation on high memory capacity cases. We show that a 25% peak efficiency reduction on a $32\times $ memory capacity can lead to a $7.4\times $ faster execution with $5.7\times $ higher effective TOPS/W than the $1\times $ memory capacity case on the same technology.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.00
自引率
4.20%
发文量
11
审稿时长
13 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信