Faquan Chen;Qingyang Tian;Lisheng Xie;Yifan Zhou;Ziren Wu;Liangshun Wu;Rendong Ying;Fei Wen;Peilin Liu
{"title":"EPOC: A 28-nm 5.3 pJ/SOP Event-Driven Parallel Neuromorphic Hardware With Neuromodulation-Based Online Learning","authors":"Faquan Chen;Qingyang Tian;Lisheng Xie;Yifan Zhou;Ziren Wu;Liangshun Wu;Rendong Ying;Fei Wen;Peilin Liu","doi":"10.1109/TBCAS.2024.3470520","DOIUrl":null,"url":null,"abstract":"Bio-inspired neuromorphic hardware with learning ability is highly promising to achieve human-like intelligence, particularly in terms of high energy efficiency and strong environmental adaptability. Though many customized prototypes have demonstrated learning ability, learning on neuromorphic hardware still lacks a bio-plausible and unified learning framework, and inherent spike-based sparsity and parallelism have not been fully exploited, which fundamentally limits their computational efficiency and scale. Therefore, we develop a unified, event-driven, and massively parallel multi-core neuromorphic online learning processor, namely EPOC. We present a neuromodulation-based neuromorphic online learning framework to unify various learning algorithms, and EPOC supports high-accuracy local/global supervised Spike Neural Network (SNN) learning with a low-memory-demand streaming single-sample learning strategy through different neuromodulator formulations. EPOC leverages a novel event-driven computation method that fully exploits spike-based sparsity throughout the forward-backward learning phases, and parallel multi-channel and multi-core computing architecture, bringing 9.9<inline-formula><tex-math>$\\times$</tex-math></inline-formula> time efficiency improvement compared with the baseline architecture. We synthesize EPOC in a 28-nm CMOS process and perform extensive benchmarking. EPOC achieves state-of-the-art learning accuracy of 99.2%, 98.2%, and 94.3% on the MNIST, NMNIST, and DVS-Gesture benchmarks, respectively. Local-learning EPOC achieves 2.9<inline-formula><tex-math>$\\times$</tex-math></inline-formula> time efficiency improvement compared with the global learning counterpart. EPOC operates at a typical clock frequency of 100 MHz, providing a peak 328 GOPS/51 GSOPS throughput and a 5.3 pJ/SOP energy efficiency.","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"19 3","pages":"629-644"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on biomedical circuits and systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10704604/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Bio-inspired neuromorphic hardware with learning ability is highly promising to achieve human-like intelligence, particularly in terms of high energy efficiency and strong environmental adaptability. Though many customized prototypes have demonstrated learning ability, learning on neuromorphic hardware still lacks a bio-plausible and unified learning framework, and inherent spike-based sparsity and parallelism have not been fully exploited, which fundamentally limits their computational efficiency and scale. Therefore, we develop a unified, event-driven, and massively parallel multi-core neuromorphic online learning processor, namely EPOC. We present a neuromodulation-based neuromorphic online learning framework to unify various learning algorithms, and EPOC supports high-accuracy local/global supervised Spike Neural Network (SNN) learning with a low-memory-demand streaming single-sample learning strategy through different neuromodulator formulations. EPOC leverages a novel event-driven computation method that fully exploits spike-based sparsity throughout the forward-backward learning phases, and parallel multi-channel and multi-core computing architecture, bringing 9.9$\times$ time efficiency improvement compared with the baseline architecture. We synthesize EPOC in a 28-nm CMOS process and perform extensive benchmarking. EPOC achieves state-of-the-art learning accuracy of 99.2%, 98.2%, and 94.3% on the MNIST, NMNIST, and DVS-Gesture benchmarks, respectively. Local-learning EPOC achieves 2.9$\times$ time efficiency improvement compared with the global learning counterpart. EPOC operates at a typical clock frequency of 100 MHz, providing a peak 328 GOPS/51 GSOPS throughput and a 5.3 pJ/SOP energy efficiency.