Yiming Ouyang , Chengming An , Jianhua Li , HuaGuo Liang
{"title":"基于NoC的深度神经网络加速器并行加载机制","authors":"Yiming Ouyang , Chengming An , Jianhua Li , HuaGuo Liang","doi":"10.1016/j.mejo.2025.106684","DOIUrl":null,"url":null,"abstract":"<div><div>Network-on-Chip (NoC) has emerged as an efficient interconnect solution for multiprocessor systems, with recent studies implementing NoC-based DNN accelerators. By interconnecting neural processing units via NoC, such designs minimize off-chip memory access, effectively reducing inference latency and power consumption. In this article, we designed the MIAO router, which supports parallel model loading and improves efficiency. It modifies traditional routers to both speed up multicast packet forwarding and optimize parallel data loading. Additionally, our path-based multicast routing minimizes redundant packets by using shared transmission paths, enhancing overall network performance. We evaluated the LeNet-5 and VGG-16 models on a simulation platform in a resource constrained NoC environment. Compared with the benchmark strategy, our solution reduces the average classification delay of the accelerator by 38.01%, packet delay by 33.16%, and packet count by 49.95%, without significantly increasing power consumption or area. This proves the superior performance of the mechanism.</div></div>","PeriodicalId":49818,"journal":{"name":"Microelectronics Journal","volume":"160 ","pages":"Article 106684"},"PeriodicalIF":1.9000,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PRLM: A parallel loading mechanism for a deep neural network accelerator based on NoC\",\"authors\":\"Yiming Ouyang , Chengming An , Jianhua Li , HuaGuo Liang\",\"doi\":\"10.1016/j.mejo.2025.106684\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Network-on-Chip (NoC) has emerged as an efficient interconnect solution for multiprocessor systems, with recent studies implementing NoC-based DNN accelerators. By interconnecting neural processing units via NoC, such designs minimize off-chip memory access, effectively reducing inference latency and power consumption. In this article, we designed the MIAO router, which supports parallel model loading and improves efficiency. It modifies traditional routers to both speed up multicast packet forwarding and optimize parallel data loading. Additionally, our path-based multicast routing minimizes redundant packets by using shared transmission paths, enhancing overall network performance. We evaluated the LeNet-5 and VGG-16 models on a simulation platform in a resource constrained NoC environment. Compared with the benchmark strategy, our solution reduces the average classification delay of the accelerator by 38.01%, packet delay by 33.16%, and packet count by 49.95%, without significantly increasing power consumption or area. This proves the superior performance of the mechanism.</div></div>\",\"PeriodicalId\":49818,\"journal\":{\"name\":\"Microelectronics Journal\",\"volume\":\"160 \",\"pages\":\"Article 106684\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Microelectronics Journal\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S187923912500133X\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microelectronics Journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S187923912500133X","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
PRLM: A parallel loading mechanism for a deep neural network accelerator based on NoC
Network-on-Chip (NoC) has emerged as an efficient interconnect solution for multiprocessor systems, with recent studies implementing NoC-based DNN accelerators. By interconnecting neural processing units via NoC, such designs minimize off-chip memory access, effectively reducing inference latency and power consumption. In this article, we designed the MIAO router, which supports parallel model loading and improves efficiency. It modifies traditional routers to both speed up multicast packet forwarding and optimize parallel data loading. Additionally, our path-based multicast routing minimizes redundant packets by using shared transmission paths, enhancing overall network performance. We evaluated the LeNet-5 and VGG-16 models on a simulation platform in a resource constrained NoC environment. Compared with the benchmark strategy, our solution reduces the average classification delay of the accelerator by 38.01%, packet delay by 33.16%, and packet count by 49.95%, without significantly increasing power consumption or area. This proves the superior performance of the mechanism.
期刊介绍:
Published since 1969, the Microelectronics Journal is an international forum for the dissemination of research and applications of microelectronic systems, circuits, and emerging technologies. Papers published in the Microelectronics Journal have undergone peer review to ensure originality, relevance, and timeliness. The journal thus provides a worldwide, regular, and comprehensive update on microelectronic circuits and systems.
The Microelectronics Journal invites papers describing significant research and applications in all of the areas listed below. Comprehensive review/survey papers covering recent developments will also be considered. The Microelectronics Journal covers circuits and systems. This topic includes but is not limited to: Analog, digital, mixed, and RF circuits and related design methodologies; Logic, architectural, and system level synthesis; Testing, design for testability, built-in self-test; Area, power, and thermal analysis and design; Mixed-domain simulation and design; Embedded systems; Non-von Neumann computing and related technologies and circuits; Design and test of high complexity systems integration; SoC, NoC, SIP, and NIP design and test; 3-D integration design and analysis; Emerging device technologies and circuits, such as FinFETs, SETs, spintronics, SFQ, MTJ, etc.
Application aspects such as signal and image processing including circuits for cryptography, sensors, and actuators including sensor networks, reliability and quality issues, and economic models are also welcome.