HuNT：利用异构 PIM 设备设计用于 DNN 训练的三维多核架构

IF 2.7 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI:10.1109/TCAD.2024.3444708

Chukwufumnanya Ogbogu;Gaurav Narang;Biresh Kumar Joardar;Janardhan Rao Doppa;Krishnendu Chakrabarty;Partha Pratim Pande

{"title":"HuNT：利用异构 PIM 设备设计用于 DNN 训练的三维多核架构","authors":"Chukwufumnanya Ogbogu;Gaurav Narang;Biresh Kumar Joardar;Janardhan Rao Doppa;Krishnendu Chakrabarty;Partha Pratim Pande","doi":"10.1109/TCAD.2024.3444708","DOIUrl":null,"url":null,"abstract":"Processing-in-memory (PIM) architectures have emerged as an attractive computing paradigm for accelerating deep neural network (DNN) training and inferencing. However, a plethora of PIM devices, e.g., resistive random-access memory, ferroelectric field-effect transistor, phase change memory, MRAM, static random-access memory, exists and each of these devices offers advantages and drawbacks in terms of power, latency, area, and nonidealities. A heterogeneous architecture that combines the benefits of multiple devices in a single platform can enable energy-efficient and high-performance DNN training and inference. 3-D integration enables the design of such a heterogeneous architecture where multiple planar tiers consisting of different PIM devices can be integrated into a single platform. In this work, we propose the HuNT framework, which hunts for (finds) an optimal DNN neural layer mapping, and planar tier configurations for a 3-D heterogeneous architecture. Overall, our experimental results demonstrate that the HuNT-enabled 3-D heterogeneous architecture achieves up to \n<inline-formula> <tex-math>$10 {\\times }$ </tex-math></inline-formula>\n and \n<inline-formula> <tex-math>$3.5 {\\times }$ </tex-math></inline-formula>\n improvement with respect to the homogeneous and existing heterogeneous PIM-based architectures, respectively, in terms of energy-efficiency (TOPS/W). Similarly, the proposed HuNT-enabled architecture outperforms existing homogeneous and heterogeneous architectures by up to \n<inline-formula> <tex-math>$8 {\\times }$ </tex-math></inline-formula>\n and \n<inline-formula> <tex-math>$2.4\\times $ </tex-math></inline-formula>\n, respectively, in terms of compute-efficiency (TOPS/mm2) without compromising the final DNN accuracy.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3300-3311"},"PeriodicalIF":2.7000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HuNT: Exploiting Heterogeneous PIM Devices to Design a 3-D Manycore Architecture for DNN Training\",\"authors\":\"Chukwufumnanya Ogbogu;Gaurav Narang;Biresh Kumar Joardar;Janardhan Rao Doppa;Krishnendu Chakrabarty;Partha Pratim Pande\",\"doi\":\"10.1109/TCAD.2024.3444708\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Processing-in-memory (PIM) architectures have emerged as an attractive computing paradigm for accelerating deep neural network (DNN) training and inferencing. However, a plethora of PIM devices, e.g., resistive random-access memory, ferroelectric field-effect transistor, phase change memory, MRAM, static random-access memory, exists and each of these devices offers advantages and drawbacks in terms of power, latency, area, and nonidealities. A heterogeneous architecture that combines the benefits of multiple devices in a single platform can enable energy-efficient and high-performance DNN training and inference. 3-D integration enables the design of such a heterogeneous architecture where multiple planar tiers consisting of different PIM devices can be integrated into a single platform. In this work, we propose the HuNT framework, which hunts for (finds) an optimal DNN neural layer mapping, and planar tier configurations for a 3-D heterogeneous architecture. Overall, our experimental results demonstrate that the HuNT-enabled 3-D heterogeneous architecture achieves up to \\n<inline-formula> <tex-math>$10 {\\\\times }$ </tex-math></inline-formula>\\n and \\n<inline-formula> <tex-math>$3.5 {\\\\times }$ </tex-math></inline-formula>\\n improvement with respect to the homogeneous and existing heterogeneous PIM-based architectures, respectively, in terms of energy-efficiency (TOPS/W). Similarly, the proposed HuNT-enabled architecture outperforms existing homogeneous and heterogeneous architectures by up to \\n<inline-formula> <tex-math>$8 {\\\\times }$ </tex-math></inline-formula>\\n and \\n<inline-formula> <tex-math>$2.4\\\\times $ </tex-math></inline-formula>\\n, respectively, in terms of compute-efficiency (TOPS/mm2) without compromising the final DNN accuracy.\",\"PeriodicalId\":13251,\"journal\":{\"name\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"volume\":\"43 11\",\"pages\":\"3300-3311\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10745791/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10745791/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

内存处理（PIM）架构已成为加速深度神经网络（DNN）训练和推理的一种极具吸引力的计算模式。然而，目前存在大量 PIM 设备，例如电阻式随机存取存储器、铁电场效应晶体管、相变存储器、MRAM、静态随机存取存储器，这些设备在功耗、延迟、面积和非理想性方面各有优缺点。在单个平台中结合多种器件优势的异构架构可实现高能效、高性能的 DNN 训练和推理。三维集成可以设计这样一种异构架构，将由不同 PIM 设备组成的多个平面层集成到一个平台中。在这项工作中，我们提出了 HuNT 框架，它可以为三维异构架构寻找（发现）最佳 DNN 神经层映射和平面层配置。总体而言，我们的实验结果表明，与基于 PIM 的同构架构和现有异构架构相比，支持 HuNT 的三维异构架构在能效（TOPS/W）方面分别实现了高达 10 {\times }$ 美元和 3.5 {\times }$ 美元的改进。同样，在计算效率（TOPS/mm2）方面，所提出的支持 HuNT 的架构比现有的同构和异构架构分别高出 8 {\times }$ 美元和 2.4 {\times }$ 美元，而不会影响最终 DNN 的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

HuNT: Exploiting Heterogeneous PIM Devices to Design a 3-D Manycore Architecture for DNN Training

Processing-in-memory (PIM) architectures have emerged as an attractive computing paradigm for accelerating deep neural network (DNN) training and inferencing. However, a plethora of PIM devices, e.g., resistive random-access memory, ferroelectric field-effect transistor, phase change memory, MRAM, static random-access memory, exists and each of these devices offers advantages and drawbacks in terms of power, latency, area, and nonidealities. A heterogeneous architecture that combines the benefits of multiple devices in a single platform can enable energy-efficient and high-performance DNN training and inference. 3-D integration enables the design of such a heterogeneous architecture where multiple planar tiers consisting of different PIM devices can be integrated into a single platform. In this work, we propose the HuNT framework, which hunts for (finds) an optimal DNN neural layer mapping, and planar tier configurations for a 3-D heterogeneous architecture. Overall, our experimental results demonstrate that the HuNT-enabled 3-D heterogeneous architecture achieves up to

$10 {\times }$

and

$3.5 {\times }$

improvement with respect to the homogeneous and existing heterogeneous PIM-based architectures, respectively, in terms of energy-efficiency (TOPS/W). Similarly, the proposed HuNT-enabled architecture outperforms existing homogeneous and heterogeneous architectures by up to

$8 {\times }$

and

$2.4\times $

, respectively, in terms of compute-efficiency (TOPS/mm2) without compromising the final DNN accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 工程技术-工程：电子与电气

CiteScore

5.60

自引率

13.80%

发文量

500

审稿时长

7 months

期刊介绍： The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.