{"title":"DNNPipe: Dynamic programming-based optimal DNN partitioning for pipelined inference on IoT networks","authors":"Woobean Seo , Saehwa Kim , Seongsoo Hong","doi":"10.1016/j.sysarc.2025.103462","DOIUrl":null,"url":null,"abstract":"<div><div>Pipeline parallelization is an effective technique that enables the efficient execution of deep neural network (DNN) inference on resource-constrained IoT devices. To enable pipeline parallelization across computing nodes with asymmetric performance profiles, interconnected via low-latency, high-bandwidth networks, we propose DNNPipe, a DNN partitioning algorithm that constructs a pipeline plan for a given DNN. The primary objective of DNNPipe is to maximize the throughput of DNN inference while minimizing the runtime overhead of DNN partitioning, which is repeatedly executed online in dynamically changing IoT environments. To achieve this, DNNPipe uses dynamic programming (DP) with pruning techniques that preserve optimality to explore the search space and find the optimal pipeline plan whose maximum stage time is no greater than that of any other possible pipeline plan. Specifically, it aggressively prunes suboptimal pipeline plans using two pruning techniques: <em>upper-bound-based pruning</em> and <em>under-utilized-stage pruning</em>. Our experimental results demonstrate that pipelined inference using an obtained optimal pipeline plan improves DNN throughput by up to 1.78 times compared to the highest performing single device and DNNPipe achieves up to 98.26 % lower runtime overhead compared to PipeEdge, the fastest known optimal DNN partitioning algorithm.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"166 ","pages":"Article 103462"},"PeriodicalIF":3.7000,"publicationDate":"2025-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762125001341","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Pipeline parallelization is an effective technique that enables the efficient execution of deep neural network (DNN) inference on resource-constrained IoT devices. To enable pipeline parallelization across computing nodes with asymmetric performance profiles, interconnected via low-latency, high-bandwidth networks, we propose DNNPipe, a DNN partitioning algorithm that constructs a pipeline plan for a given DNN. The primary objective of DNNPipe is to maximize the throughput of DNN inference while minimizing the runtime overhead of DNN partitioning, which is repeatedly executed online in dynamically changing IoT environments. To achieve this, DNNPipe uses dynamic programming (DP) with pruning techniques that preserve optimality to explore the search space and find the optimal pipeline plan whose maximum stage time is no greater than that of any other possible pipeline plan. Specifically, it aggressively prunes suboptimal pipeline plans using two pruning techniques: upper-bound-based pruning and under-utilized-stage pruning. Our experimental results demonstrate that pipelined inference using an obtained optimal pipeline plan improves DNN throughput by up to 1.78 times compared to the highest performing single device and DNNPipe achieves up to 98.26 % lower runtime overhead compared to PipeEdge, the fastest known optimal DNN partitioning algorithm.
期刊介绍:
The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software.
Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.