基于NoC的深度神经网络加速器并行加载机制

IF 1.9 3区 工程技术 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC
Yiming Ouyang , Chengming An , Jianhua Li , HuaGuo Liang
{"title":"基于NoC的深度神经网络加速器并行加载机制","authors":"Yiming Ouyang ,&nbsp;Chengming An ,&nbsp;Jianhua Li ,&nbsp;HuaGuo Liang","doi":"10.1016/j.mejo.2025.106684","DOIUrl":null,"url":null,"abstract":"<div><div>Network-on-Chip (NoC) has emerged as an efficient interconnect solution for multiprocessor systems, with recent studies implementing NoC-based DNN accelerators. By interconnecting neural processing units via NoC, such designs minimize off-chip memory access, effectively reducing inference latency and power consumption. In this article, we designed the MIAO router, which supports parallel model loading and improves efficiency. It modifies traditional routers to both speed up multicast packet forwarding and optimize parallel data loading. Additionally, our path-based multicast routing minimizes redundant packets by using shared transmission paths, enhancing overall network performance. We evaluated the LeNet-5 and VGG-16 models on a simulation platform in a resource constrained NoC environment. Compared with the benchmark strategy, our solution reduces the average classification delay of the accelerator by 38.01%, packet delay by 33.16%, and packet count by 49.95%, without significantly increasing power consumption or area. This proves the superior performance of the mechanism.</div></div>","PeriodicalId":49818,"journal":{"name":"Microelectronics Journal","volume":"160 ","pages":"Article 106684"},"PeriodicalIF":1.9000,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PRLM: A parallel loading mechanism for a deep neural network accelerator based on NoC\",\"authors\":\"Yiming Ouyang ,&nbsp;Chengming An ,&nbsp;Jianhua Li ,&nbsp;HuaGuo Liang\",\"doi\":\"10.1016/j.mejo.2025.106684\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Network-on-Chip (NoC) has emerged as an efficient interconnect solution for multiprocessor systems, with recent studies implementing NoC-based DNN accelerators. By interconnecting neural processing units via NoC, such designs minimize off-chip memory access, effectively reducing inference latency and power consumption. In this article, we designed the MIAO router, which supports parallel model loading and improves efficiency. It modifies traditional routers to both speed up multicast packet forwarding and optimize parallel data loading. Additionally, our path-based multicast routing minimizes redundant packets by using shared transmission paths, enhancing overall network performance. We evaluated the LeNet-5 and VGG-16 models on a simulation platform in a resource constrained NoC environment. Compared with the benchmark strategy, our solution reduces the average classification delay of the accelerator by 38.01%, packet delay by 33.16%, and packet count by 49.95%, without significantly increasing power consumption or area. This proves the superior performance of the mechanism.</div></div>\",\"PeriodicalId\":49818,\"journal\":{\"name\":\"Microelectronics Journal\",\"volume\":\"160 \",\"pages\":\"Article 106684\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Microelectronics Journal\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S187923912500133X\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microelectronics Journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S187923912500133X","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

片上网络(NoC)已成为多处理器系统的有效互连解决方案,最近的研究实现了基于NoC的DNN加速器。通过通过NoC互连神经处理单元,这种设计最大限度地减少了片外存储器访问,有效地降低了推理延迟和功耗。在本文中,我们设计了支持并行模型加载的MIAO路由器,提高了效率。它对传统路由器进行了改进,既加快了组播分组转发速度,又优化了并行数据加载。此外,我们的基于路径的多播路由通过使用共享传输路径将冗余数据包最小化,从而增强了整体网络性能。在资源受限的NoC环境下,我们在仿真平台上评估了LeNet-5和VGG-16模型。与基准策略相比,我们的解决方案在不显著增加功耗和面积的情况下,将加速器的平均分类延迟降低了38.01%,数据包延迟降低了33.16%,数据包计数降低了49.95%。这证明了该机构的优越性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

PRLM: A parallel loading mechanism for a deep neural network accelerator based on NoC

PRLM: A parallel loading mechanism for a deep neural network accelerator based on NoC
Network-on-Chip (NoC) has emerged as an efficient interconnect solution for multiprocessor systems, with recent studies implementing NoC-based DNN accelerators. By interconnecting neural processing units via NoC, such designs minimize off-chip memory access, effectively reducing inference latency and power consumption. In this article, we designed the MIAO router, which supports parallel model loading and improves efficiency. It modifies traditional routers to both speed up multicast packet forwarding and optimize parallel data loading. Additionally, our path-based multicast routing minimizes redundant packets by using shared transmission paths, enhancing overall network performance. We evaluated the LeNet-5 and VGG-16 models on a simulation platform in a resource constrained NoC environment. Compared with the benchmark strategy, our solution reduces the average classification delay of the accelerator by 38.01%, packet delay by 33.16%, and packet count by 49.95%, without significantly increasing power consumption or area. This proves the superior performance of the mechanism.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Microelectronics Journal
Microelectronics Journal 工程技术-工程:电子与电气
CiteScore
4.00
自引率
27.30%
发文量
222
审稿时长
43 days
期刊介绍: Published since 1969, the Microelectronics Journal is an international forum for the dissemination of research and applications of microelectronic systems, circuits, and emerging technologies. Papers published in the Microelectronics Journal have undergone peer review to ensure originality, relevance, and timeliness. The journal thus provides a worldwide, regular, and comprehensive update on microelectronic circuits and systems. The Microelectronics Journal invites papers describing significant research and applications in all of the areas listed below. Comprehensive review/survey papers covering recent developments will also be considered. The Microelectronics Journal covers circuits and systems. This topic includes but is not limited to: Analog, digital, mixed, and RF circuits and related design methodologies; Logic, architectural, and system level synthesis; Testing, design for testability, built-in self-test; Area, power, and thermal analysis and design; Mixed-domain simulation and design; Embedded systems; Non-von Neumann computing and related technologies and circuits; Design and test of high complexity systems integration; SoC, NoC, SIP, and NIP design and test; 3-D integration design and analysis; Emerging device technologies and circuits, such as FinFETs, SETs, spintronics, SFQ, MTJ, etc. Application aspects such as signal and image processing including circuits for cryptography, sensors, and actuators including sensor networks, reliability and quality issues, and economic models are also welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信