基于NoC的深度神经网络加速器并行加载机制

IF 1.9 3区工程技术 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC

Microelectronics Journal Pub Date : 2025-04-19 DOI:10.1016/j.mejo.2025.106684

Yiming Ouyang , Chengming An , Jianhua Li , HuaGuo Liang

{"title":"基于NoC的深度神经网络加速器并行加载机制","authors":"Yiming Ouyang , Chengming An , Jianhua Li , HuaGuo Liang","doi":"10.1016/j.mejo.2025.106684","DOIUrl":null,"url":null,"abstract":"<div><div>Network-on-Chip (NoC) has emerged as an efficient interconnect solution for multiprocessor systems, with recent studies implementing NoC-based DNN accelerators. By interconnecting neural processing units via NoC, such designs minimize off-chip memory access, effectively reducing inference latency and power consumption. In this article, we designed the MIAO router, which supports parallel model loading and improves efficiency. It modifies traditional routers to both speed up multicast packet forwarding and optimize parallel data loading. Additionally, our path-based multicast routing minimizes redundant packets by using shared transmission paths, enhancing overall network performance. We evaluated the LeNet-5 and VGG-16 models on a simulation platform in a resource constrained NoC environment. Compared with the benchmark strategy, our solution reduces the average classification delay of the accelerator by 38.01%, packet delay by 33.16%, and packet count by 49.95%, without significantly increasing power consumption or area. This proves the superior performance of the mechanism.</div></div>","PeriodicalId":49818,"journal":{"name":"Microelectronics Journal","volume":"160 ","pages":"Article 106684"},"PeriodicalIF":1.9000,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PRLM: A parallel loading mechanism for a deep neural network accelerator based on NoC\",\"authors\":\"Yiming Ouyang , Chengming An , Jianhua Li , HuaGuo Liang\",\"doi\":\"10.1016/j.mejo.2025.106684\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Network-on-Chip (NoC) has emerged as an efficient interconnect solution for multiprocessor systems, with recent studies implementing NoC-based DNN accelerators. By interconnecting neural processing units via NoC, such designs minimize off-chip memory access, effectively reducing inference latency and power consumption. In this article, we designed the MIAO router, which supports parallel model loading and improves efficiency. It modifies traditional routers to both speed up multicast packet forwarding and optimize parallel data loading. Additionally, our path-based multicast routing minimizes redundant packets by using shared transmission paths, enhancing overall network performance. We evaluated the LeNet-5 and VGG-16 models on a simulation platform in a resource constrained NoC environment. Compared with the benchmark strategy, our solution reduces the average classification delay of the accelerator by 38.01%, packet delay by 33.16%, and packet count by 49.95%, without significantly increasing power consumption or area. This proves the superior performance of the mechanism.</div></div>\",\"PeriodicalId\":49818,\"journal\":{\"name\":\"Microelectronics Journal\",\"volume\":\"160 \",\"pages\":\"Article 106684\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Microelectronics Journal\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S187923912500133X\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microelectronics Journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S187923912500133X","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

片上网络（NoC）已成为多处理器系统的有效互连解决方案，最近的研究实现了基于NoC的DNN加速器。通过通过NoC互连神经处理单元，这种设计最大限度地减少了片外存储器访问，有效地降低了推理延迟和功耗。在本文中，我们设计了支持并行模型加载的MIAO路由器，提高了效率。它对传统路由器进行了改进，既加快了组播分组转发速度，又优化了并行数据加载。此外，我们的基于路径的多播路由通过使用共享传输路径将冗余数据包最小化，从而增强了整体网络性能。在资源受限的NoC环境下，我们在仿真平台上评估了LeNet-5和VGG-16模型。与基准策略相比，我们的解决方案在不显著增加功耗和面积的情况下，将加速器的平均分类延迟降低了38.01%，数据包延迟降低了33.16%，数据包计数降低了49.95%。这证明了该机构的优越性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

PRLM: A parallel loading mechanism for a deep neural network accelerator based on NoC

查看原文本刊更多论文

PRLM: A parallel loading mechanism for a deep neural network accelerator based on NoC

Network-on-Chip (NoC) has emerged as an efficient interconnect solution for multiprocessor systems, with recent studies implementing NoC-based DNN accelerators. By interconnecting neural processing units via NoC, such designs minimize off-chip memory access, effectively reducing inference latency and power consumption. In this article, we designed the MIAO router, which supports parallel model loading and improves efficiency. It modifies traditional routers to both speed up multicast packet forwarding and optimize parallel data loading. Additionally, our path-based multicast routing minimizes redundant packets by using shared transmission paths, enhancing overall network performance. We evaluated the LeNet-5 and VGG-16 models on a simulation platform in a resource constrained NoC environment. Compared with the benchmark strategy, our solution reduces the average classification delay of the accelerator by 38.01%, packet delay by 33.16%, and packet count by 49.95%, without significantly increasing power consumption or area. This proves the superior performance of the mechanism.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Microelectronics Journal 工程技术-工程：电子与电气

CiteScore

4.00

自引率

27.30%

发文量

222

审稿时长

43 days

期刊介绍： Published since 1969, the Microelectronics Journal is an international forum for the dissemination of research and applications of microelectronic systems, circuits, and emerging technologies. Papers published in the Microelectronics Journal have undergone peer review to ensure originality, relevance, and timeliness. The journal thus provides a worldwide, regular, and comprehensive update on microelectronic circuits and systems. The Microelectronics Journal invites papers describing significant research and applications in all of the areas listed below. Comprehensive review/survey papers covering recent developments will also be considered. The Microelectronics Journal covers circuits and systems. This topic includes but is not limited to: Analog, digital, mixed, and RF circuits and related design methodologies; Logic, architectural, and system level synthesis; Testing, design for testability, built-in self-test; Area, power, and thermal analysis and design; Mixed-domain simulation and design; Embedded systems; Non-von Neumann computing and related technologies and circuits; Design and test of high complexity systems integration; SoC, NoC, SIP, and NIP design and test; 3-D integration design and analysis; Emerging device technologies and circuits, such as FinFETs, SETs, spintronics, SFQ, MTJ, etc. Application aspects such as signal and image processing including circuits for cryptography, sensors, and actuators including sensor networks, reliability and quality issues, and economic models are also welcome.