Dongrui Li;Ming Ming Wong;Yi Sheng Chong;Jun Zhou;Mohit Upadhyay;Ananta Balaji;Aarthy Mani;Weng Fai Wong;Li Shiuan Peh;Anh Tuan Do;Bo Wang
{"title":"1.63 pJ/SOP Neuromorphic Processor With Integrated Partial Sum Routers for In-Network Computing","authors":"Dongrui Li;Ming Ming Wong;Yi Sheng Chong;Jun Zhou;Mohit Upadhyay;Ananta Balaji;Aarthy Mani;Weng Fai Wong;Li Shiuan Peh;Anh Tuan Do;Bo Wang","doi":"10.1109/TVLSI.2024.3409652","DOIUrl":null,"url":null,"abstract":"Neuromorphic computing is promising to achieve unprecedented energy efficiency by emulating the human brain’s mechanism. Conventional neuromorphic accelerators employ split-and-merge method to map spiking neural networks’ inputs to surpass the fan-in capabilities of a single neuron core. However, this approach gives rise to the risk of accuracy compromise and extra core usage for the merging process. Moreover, it requires excessive data movement and clock cycles to aggregate spikes generated by partial sums instead of total sums obtained from different cores with substantial power and energy overhead. This work presents a novel approach to addressing the challenges imposed by the split-and-merge method. We propose an energy-efficient, reconfigurable neuromorphic processor that leverages several key techniques to mitigate the above issues. First, we introduce a partial sum router circuitry that enables in-network computing (INC), eliminating the need for extra merge cores. Second, we adopt software-defined Networks-on-Chip (NoCs) by leveraging predefined, efficient routing, eliminating power-hungry routing computation. At last, we incorporate fine-grained power gating and clock gating techniques for further power reduction. Experimental results from our test chip demonstrate the lossless mapping of the algorithm and exceptional energy efficiency, achieving an energy consumption of 1.63 pJ/SOP at 0.48 V. This energy efficiency represents a 22.4% improvement compared to the state-of-the-art results. Our proposed neuromorphic processor provides an efficient and flexible solution for neural network processing, mitigating the limitations of the traditional split-and-merge approach while delivering superior energy efficiency.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 11","pages":"2085-2092"},"PeriodicalIF":2.8000,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10570234/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Neuromorphic computing is promising to achieve unprecedented energy efficiency by emulating the human brain’s mechanism. Conventional neuromorphic accelerators employ split-and-merge method to map spiking neural networks’ inputs to surpass the fan-in capabilities of a single neuron core. However, this approach gives rise to the risk of accuracy compromise and extra core usage for the merging process. Moreover, it requires excessive data movement and clock cycles to aggregate spikes generated by partial sums instead of total sums obtained from different cores with substantial power and energy overhead. This work presents a novel approach to addressing the challenges imposed by the split-and-merge method. We propose an energy-efficient, reconfigurable neuromorphic processor that leverages several key techniques to mitigate the above issues. First, we introduce a partial sum router circuitry that enables in-network computing (INC), eliminating the need for extra merge cores. Second, we adopt software-defined Networks-on-Chip (NoCs) by leveraging predefined, efficient routing, eliminating power-hungry routing computation. At last, we incorporate fine-grained power gating and clock gating techniques for further power reduction. Experimental results from our test chip demonstrate the lossless mapping of the algorithm and exceptional energy efficiency, achieving an energy consumption of 1.63 pJ/SOP at 0.48 V. This energy efficiency represents a 22.4% improvement compared to the state-of-the-art results. Our proposed neuromorphic processor provides an efficient and flexible solution for neural network processing, mitigating the limitations of the traditional split-and-merge approach while delivering superior energy efficiency.
期刊介绍:
The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.