NEXUS：用于实时数据处理的 28 纳米 3.3pJ/SOP 16 核钻石拓扑尖峰神经网络。

IEEE transactions on biomedical circuits and systems Pub Date : 2024-08-30 DOI:10.1109/TBCAS.2024.3452635

Maryam Sadeghi;Yasser Rezaeiyan;Dario Fernandez Khatiboun;Sherif Eissa;Federico Corradi;Charles Augustine;Farshad Moradi

{"title":"NEXUS：用于实时数据处理的 28 纳米 3.3pJ/SOP 16 核钻石拓扑尖峰神经网络。","authors":"Maryam Sadeghi;Yasser Rezaeiyan;Dario Fernandez Khatiboun;Sherif Eissa;Federico Corradi;Charles Augustine;Farshad Moradi","doi":"10.1109/TBCAS.2024.3452635","DOIUrl":null,"url":null,"abstract":"The realization of brain-scale spiking neural networks (SNNs) is impeded by power constraints and low integration density. To address these challenges, multi-core SNNs are utilized to emulate numerous neurons with high energy efficiency, where spike packets are routed through a network-on-chip (NoC). However, the information can be lost in the NoC under high spike traffic conditions, leading to performance degradation. This work presents NEXUS, a 16-core SNN with a diamond-shaped NoC topology fabricated in 28-nm CMOS technology. It integrates 4096 leaky integrate-and-fire (LIF) neurons with 1M 4-bit synaptic weights, occupying an area of 2.16 mm2. The proposed NoC architecture is scalable to any network size, ensuring no data loss due to contending packets with a maximum routing latency of 5.1<inline-formula><tex-math>$\\mu$</tex-math></inline-formula>s for 16 cores. The proposed congestion management method eliminates the need for FIFO in routers, resulting in a compact router footprint of 0.001 mm2. The proposed neurosynaptic core allows for increasing the processing speed by up to 8.5<inline-formula><tex-math>$\\times$</tex-math></inline-formula> depending on input sparsity. The SNN achieves a peak throughput of 4.7 GSOP/s at 0.9 V, consuming a minimum energy per synaptic operation (SOP) of 3.3 pJ at 0.55 V. A 4-layer feed-forward network is mapped onto the chip, classifying MNIST digits with 92.3% accuracy at 8.4K-classification/s and consuming 2.7-<inline-formula><tex-math>$\\mu$</tex-math></inline-formula>J/classification. Additionally, an audio recognition task mapped onto the chip achieves 87.4% accuracy at 215-<inline-formula><tex-math>$\\mu$</tex-math></inline-formula>J/classification.","PeriodicalId":94031,"journal":{"name":"IEEE transactions on biomedical circuits and systems","volume":"19 3","pages":"523-535"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NEXUS: A 28nm 3.3pJ/SOP 16-Core Spiking Neural Network With a Diamond Topology for Real-Time Data Processing\",\"authors\":\"Maryam Sadeghi;Yasser Rezaeiyan;Dario Fernandez Khatiboun;Sherif Eissa;Federico Corradi;Charles Augustine;Farshad Moradi\",\"doi\":\"10.1109/TBCAS.2024.3452635\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The realization of brain-scale spiking neural networks (SNNs) is impeded by power constraints and low integration density. To address these challenges, multi-core SNNs are utilized to emulate numerous neurons with high energy efficiency, where spike packets are routed through a network-on-chip (NoC). However, the information can be lost in the NoC under high spike traffic conditions, leading to performance degradation. This work presents NEXUS, a 16-core SNN with a diamond-shaped NoC topology fabricated in 28-nm CMOS technology. It integrates 4096 leaky integrate-and-fire (LIF) neurons with 1M 4-bit synaptic weights, occupying an area of 2.16 mm2. The proposed NoC architecture is scalable to any network size, ensuring no data loss due to contending packets with a maximum routing latency of 5.1<inline-formula><tex-math>$\\\\mu$</tex-math></inline-formula>s for 16 cores. The proposed congestion management method eliminates the need for FIFO in routers, resulting in a compact router footprint of 0.001 mm2. The proposed neurosynaptic core allows for increasing the processing speed by up to 8.5<inline-formula><tex-math>$\\\\times$</tex-math></inline-formula> depending on input sparsity. The SNN achieves a peak throughput of 4.7 GSOP/s at 0.9 V, consuming a minimum energy per synaptic operation (SOP) of 3.3 pJ at 0.55 V. A 4-layer feed-forward network is mapped onto the chip, classifying MNIST digits with 92.3% accuracy at 8.4K-classification/s and consuming 2.7-<inline-formula><tex-math>$\\\\mu$</tex-math></inline-formula>J/classification. Additionally, an audio recognition task mapped onto the chip achieves 87.4% accuracy at 215-<inline-formula><tex-math>$\\\\mu$</tex-math></inline-formula>J/classification.\",\"PeriodicalId\":94031,\"journal\":{\"name\":\"IEEE transactions on biomedical circuits and systems\",\"volume\":\"19 3\",\"pages\":\"523-535\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on biomedical circuits and systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10661301/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on biomedical circuits and systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10661301/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

功率限制和低集成密度阻碍了大脑级尖峰神经网络（SNN）的实现。为了应对这些挑战，多核 SNNs 被用来以高能量效率模拟大量神经元，其中尖峰数据包通过片上网络（NoC）路由。然而，在高尖峰流量条件下，信息可能会在 NoC 中丢失，从而导致性能下降。本文介绍的 NEXUS 是一种 16 核 SNN，采用 28 纳米 CMOS 技术制造，具有菱形 NoC 拓扑。它集成了 4096 个具有 100 万个 4 位突触权重的泄漏积分发射（LIF）神经元，占地面积为 2.16 平方毫米。所提出的 NoC 架构可扩展至任何网络规模，在 16 个内核的最大路由延迟为 5.1μs 的情况下，确保不会因数据包竞争而导致数据丢失。所提出的拥塞管理方法无需在路由器中使用先进先出（FIFO），因此路由器占地面积仅为 0.001 平方毫米。拟议的神经突触内核可将处理速度提高 8.5 倍，具体取决于输入的稀疏程度。SNN 在 0.9 V 电压下的峰值吞吐量为 4.7 GSOP/s，在 0.55 V 电压下每次突触操作 (SOP) 的最低能耗为 3.3 pJ。在芯片上映射了一个 4 层前馈网络，以 8.4Kclassification/ s 的速度对 MNIST 数字进行分类，准确率达 92.3%，每分类消耗 2.7-μJ 能量。此外，映射到芯片上的音频识别任务以 215-μJ/classification 的速度达到了 87.4% 的准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

NEXUS: A 28nm 3.3pJ/SOP 16-Core Spiking Neural Network With a Diamond Topology for Real-Time Data Processing

The realization of brain-scale spiking neural networks (SNNs) is impeded by power constraints and low integration density. To address these challenges, multi-core SNNs are utilized to emulate numerous neurons with high energy efficiency, where spike packets are routed through a network-on-chip (NoC). However, the information can be lost in the NoC under high spike traffic conditions, leading to performance degradation. This work presents NEXUS, a 16-core SNN with a diamond-shaped NoC topology fabricated in 28-nm CMOS technology. It integrates 4096 leaky integrate-and-fire (LIF) neurons with 1M 4-bit synaptic weights, occupying an area of 2.16 mm². The proposed NoC architecture is scalable to any network size, ensuring no data loss due to contending packets with a maximum routing latency of 5.1

$\mu$

s for 16 cores. The proposed congestion management method eliminates the need for FIFO in routers, resulting in a compact router footprint of 0.001 mm². The proposed neurosynaptic core allows for increasing the processing speed by up to 8.5

$\times$

depending on input sparsity. The SNN achieves a peak throughput of 4.7 GSOP/s at 0.9 V, consuming a minimum energy per synaptic operation (SOP) of 3.3 pJ at 0.55 V. A 4-layer feed-forward network is mapped onto the chip, classifying MNIST digits with 92.3% accuracy at 8.4K-classification/s and consuming 2.7-

$\mu$

J/classification. Additionally, an audio recognition task mapped onto the chip achieves 87.4% accuracy at 215-

$\mu$

J/classification.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on biomedical circuits and systems

自引率

0.00%

发文量