1.63 pJ/SOP Neuromorphic Processor With Integrated Partial Sum Routers for In-Network Computing

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-06-24 DOI:10.1109/TVLSI.2024.3409652

Dongrui Li;Ming Ming Wong;Yi Sheng Chong;Jun Zhou;Mohit Upadhyay;Ananta Balaji;Aarthy Mani;Weng Fai Wong;Li Shiuan Peh;Anh Tuan Do;Bo Wang

{"title":"1.63 pJ/SOP Neuromorphic Processor With Integrated Partial Sum Routers for In-Network Computing","authors":"Dongrui Li;Ming Ming Wong;Yi Sheng Chong;Jun Zhou;Mohit Upadhyay;Ananta Balaji;Aarthy Mani;Weng Fai Wong;Li Shiuan Peh;Anh Tuan Do;Bo Wang","doi":"10.1109/TVLSI.2024.3409652","DOIUrl":null,"url":null,"abstract":"Neuromorphic computing is promising to achieve unprecedented energy efficiency by emulating the human brain’s mechanism. Conventional neuromorphic accelerators employ split-and-merge method to map spiking neural networks’ inputs to surpass the fan-in capabilities of a single neuron core. However, this approach gives rise to the risk of accuracy compromise and extra core usage for the merging process. Moreover, it requires excessive data movement and clock cycles to aggregate spikes generated by partial sums instead of total sums obtained from different cores with substantial power and energy overhead. This work presents a novel approach to addressing the challenges imposed by the split-and-merge method. We propose an energy-efficient, reconfigurable neuromorphic processor that leverages several key techniques to mitigate the above issues. First, we introduce a partial sum router circuitry that enables in-network computing (INC), eliminating the need for extra merge cores. Second, we adopt software-defined Networks-on-Chip (NoCs) by leveraging predefined, efficient routing, eliminating power-hungry routing computation. At last, we incorporate fine-grained power gating and clock gating techniques for further power reduction. Experimental results from our test chip demonstrate the lossless mapping of the algorithm and exceptional energy efficiency, achieving an energy consumption of 1.63 pJ/SOP at 0.48 V. This energy efficiency represents a 22.4% improvement compared to the state-of-the-art results. Our proposed neuromorphic processor provides an efficient and flexible solution for neural network processing, mitigating the limitations of the traditional split-and-merge approach while delivering superior energy efficiency.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 11","pages":"2085-2092"},"PeriodicalIF":2.8000,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10570234/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Neuromorphic computing is promising to achieve unprecedented energy efficiency by emulating the human brain’s mechanism. Conventional neuromorphic accelerators employ split-and-merge method to map spiking neural networks’ inputs to surpass the fan-in capabilities of a single neuron core. However, this approach gives rise to the risk of accuracy compromise and extra core usage for the merging process. Moreover, it requires excessive data movement and clock cycles to aggregate spikes generated by partial sums instead of total sums obtained from different cores with substantial power and energy overhead. This work presents a novel approach to addressing the challenges imposed by the split-and-merge method. We propose an energy-efficient, reconfigurable neuromorphic processor that leverages several key techniques to mitigate the above issues. First, we introduce a partial sum router circuitry that enables in-network computing (INC), eliminating the need for extra merge cores. Second, we adopt software-defined Networks-on-Chip (NoCs) by leveraging predefined, efficient routing, eliminating power-hungry routing computation. At last, we incorporate fine-grained power gating and clock gating techniques for further power reduction. Experimental results from our test chip demonstrate the lossless mapping of the algorithm and exceptional energy efficiency, achieving an energy consumption of 1.63 pJ/SOP at 0.48 V. This energy efficiency represents a 22.4% improvement compared to the state-of-the-art results. Our proposed neuromorphic processor provides an efficient and flexible solution for neural network processing, mitigating the limitations of the traditional split-and-merge approach while delivering superior energy efficiency.

查看原文本刊更多论文

1.63 pJ/SOP 神经形态处理器，集成部分和路由器，用于网内计算

神经形态计算有望通过模拟人脑机制实现前所未有的能效。传统的神经形态加速器采用分割合并法映射尖峰神经网络的输入，以超越单个神经元内核的扇入能力。然而，这种方法存在精度受损的风险，而且在合并过程中需要使用额外的内核。此外，它还需要过多的数据移动和时钟周期，以聚合由部分总和产生的峰值，而不是从不同内核获得的总和，这将带来巨大的功耗和能耗开销。本研究提出了一种新方法来应对分割合并法带来的挑战。我们提出了一种高能效、可重构的神经形态处理器，利用几种关键技术来缓解上述问题。首先，我们引入了部分和路由器电路，实现了网络内计算（INC），从而无需额外的合并内核。其次，我们采用软件定义的片上网络（NoC），利用预定义的高效路由，消除了耗电的路由计算。最后，我们采用了细粒度功率门控和时钟门控技术，以进一步降低功耗。我们测试芯片的实验结果表明，该算法具有无损映射和出色的能效，在 0.48 V 电压下能耗仅为 1.63 pJ/SOP。我们提出的神经形态处理器为神经网络处理提供了高效、灵活的解决方案，既缓解了传统拆分合并方法的局限性，又实现了卓越的能效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Very Large Scale Integration (VLSI) Systems 工程技术-工程：电子与电气

CiteScore

6.40

自引率

7.10%

发文量

187

审稿时长

3.6 months

期刊介绍： The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society. Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels. To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.