Big-Little Chiplets for In-Memory Acceleration of DNNs: A Scalable Heterogeneous Architecture

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2022-10-29 DOI:10.1145/3508352.3549447

Gokul Krishnan, A. Goksoy, Sumit K. Mandal, Zhenyu Wang, C. Chakrabarti, Jae-sun Seo, U. Ogras, Yu Cao

{"title":"Big-Little Chiplets for In-Memory Acceleration of DNNs: A Scalable Heterogeneous Architecture","authors":"Gokul Krishnan, A. Goksoy, Sumit K. Mandal, Zhenyu Wang, C. Chakrabarti, Jae-sun Seo, U. Ogras, Yu Cao","doi":"10.1145/3508352.3549447","DOIUrl":null,"url":null,"abstract":"Monolithic in-memory computing (IMC) architectures face significant yield and fabrication cost challenges as the complexity of DNNs increases. Chiplet-based IMCs that integrate multiple dies with advanced 2.5D/3D packaging offers a low-cost and scalable solution. They enable heterogeneous architectures where the chiplets and their associated interconnection can be tailored to the non-uniform algorithmic structures to maximize IMC utilization and reduce energy consumption. This paper proposes a heterogeneous IMC architecture with big-little chiplets and a hybrid network-on-package (NoP) to optimize the utilization, interconnect bandwidth, and energy efficiency. For a given DNN, we develop a custom methodology to map the model onto the big-little architecture such that the early layers in the DNN are mapped to the little chiplets with higher NoP bandwidth and the subsequent layers are mapped to the big chiplets with lower NoP bandwidth. Furthermore, we achieve a scalable solution by incorporating a DRAM into each chiplet to support a wide range of DNNs beyond the area limit. Compared to a homogeneous chiplet-based IMC architecture, the proposed big-little architecture achieves up to 329× improvement in the energy-delay-area product (EDAP) and up to 2× higher IMC utilization. Experimental evaluation of the proposed big-little chiplet-based RRAM IMC architecture for ResNet-50 on ImageNet shows 259×, 139×, and 48× improvement in energy-efficiency at lower area compared to Nvidia V100 GPU, Nvidia T4 GPU, and SIMBA architecture, respectively.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508352.3549447","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Monolithic in-memory computing (IMC) architectures face significant yield and fabrication cost challenges as the complexity of DNNs increases. Chiplet-based IMCs that integrate multiple dies with advanced 2.5D/3D packaging offers a low-cost and scalable solution. They enable heterogeneous architectures where the chiplets and their associated interconnection can be tailored to the non-uniform algorithmic structures to maximize IMC utilization and reduce energy consumption. This paper proposes a heterogeneous IMC architecture with big-little chiplets and a hybrid network-on-package (NoP) to optimize the utilization, interconnect bandwidth, and energy efficiency. For a given DNN, we develop a custom methodology to map the model onto the big-little architecture such that the early layers in the DNN are mapped to the little chiplets with higher NoP bandwidth and the subsequent layers are mapped to the big chiplets with lower NoP bandwidth. Furthermore, we achieve a scalable solution by incorporating a DRAM into each chiplet to support a wide range of DNNs beyond the area limit. Compared to a homogeneous chiplet-based IMC architecture, the proposed big-little architecture achieves up to 329× improvement in the energy-delay-area product (EDAP) and up to 2× higher IMC utilization. Experimental evaluation of the proposed big-little chiplet-based RRAM IMC architecture for ResNet-50 on ImageNet shows 259×, 139×, and 48× improvement in energy-efficiency at lower area compared to Nvidia V100 GPU, Nvidia T4 GPU, and SIMBA architecture, respectively.

查看原文本刊更多论文

基于大小芯片的dnn内存加速:一个可扩展的异构架构

随着深度神经网络复杂性的增加，单片内存计算(IMC)架构面临着显著的成收率和制造成本挑战。基于芯片的集成集成电路集成了多个芯片和先进的2.5D/3D封装，提供了低成本和可扩展的解决方案。它们支持异构架构，其中小芯片及其相关互连可以根据非统一算法结构进行定制，以最大限度地提高IMC利用率并降低能耗。本文提出了一种具有大小芯片和混合包上网络(NoP)的异构IMC架构，以优化利用率、互连带宽和能源效率。对于给定的DNN，我们开发了一种自定义方法将模型映射到大-小架构上，这样DNN中的早期层被映射到具有较高NoP带宽的小芯片上，随后的层被映射到具有较低NoP带宽的大芯片上。此外，我们通过将DRAM集成到每个芯片中来实现可扩展的解决方案，以支持超出区域限制的大范围dnn。与基于同构芯片的IMC架构相比，该架构的能量延迟面积积(EDAP)提高了329倍，IMC利用率提高了2倍。在ImageNet上对提出的基于大大小芯片的ResNet-50 RRAM IMC架构进行了实验评估，结果显示，与Nvidia V100 GPU、Nvidia T4 GPU和SIMBA架构相比，该架构在低区域的能效分别提高了259倍、139倍和48倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

自引率

0.00%

发文量