DNNPipe: Dynamic programming-based optimal DNN partitioning for pipelined inference on IoT networks

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture Pub Date : 2025-05-18 DOI:10.1016/j.sysarc.2025.103462

Woobean Seo , Saehwa Kim , Seongsoo Hong

{"title":"DNNPipe: Dynamic programming-based optimal DNN partitioning for pipelined inference on IoT networks","authors":"Woobean Seo , Saehwa Kim , Seongsoo Hong","doi":"10.1016/j.sysarc.2025.103462","DOIUrl":null,"url":null,"abstract":"<div><div>Pipeline parallelization is an effective technique that enables the efficient execution of deep neural network (DNN) inference on resource-constrained IoT devices. To enable pipeline parallelization across computing nodes with asymmetric performance profiles, interconnected via low-latency, high-bandwidth networks, we propose DNNPipe, a DNN partitioning algorithm that constructs a pipeline plan for a given DNN. The primary objective of DNNPipe is to maximize the throughput of DNN inference while minimizing the runtime overhead of DNN partitioning, which is repeatedly executed online in dynamically changing IoT environments. To achieve this, DNNPipe uses dynamic programming (DP) with pruning techniques that preserve optimality to explore the search space and find the optimal pipeline plan whose maximum stage time is no greater than that of any other possible pipeline plan. Specifically, it aggressively prunes suboptimal pipeline plans using two pruning techniques: <em>upper-bound-based pruning</em> and <em>under-utilized-stage pruning</em>. Our experimental results demonstrate that pipelined inference using an obtained optimal pipeline plan improves DNN throughput by up to 1.78 times compared to the highest performing single device and DNNPipe achieves up to 98.26 % lower runtime overhead compared to PipeEdge, the fastest known optimal DNN partitioning algorithm.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"166 ","pages":"Article 103462"},"PeriodicalIF":3.7000,"publicationDate":"2025-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762125001341","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Pipeline parallelization is an effective technique that enables the efficient execution of deep neural network (DNN) inference on resource-constrained IoT devices. To enable pipeline parallelization across computing nodes with asymmetric performance profiles, interconnected via low-latency, high-bandwidth networks, we propose DNNPipe, a DNN partitioning algorithm that constructs a pipeline plan for a given DNN. The primary objective of DNNPipe is to maximize the throughput of DNN inference while minimizing the runtime overhead of DNN partitioning, which is repeatedly executed online in dynamically changing IoT environments. To achieve this, DNNPipe uses dynamic programming (DP) with pruning techniques that preserve optimality to explore the search space and find the optimal pipeline plan whose maximum stage time is no greater than that of any other possible pipeline plan. Specifically, it aggressively prunes suboptimal pipeline plans using two pruning techniques: upper-bound-based pruning and under-utilized-stage pruning. Our experimental results demonstrate that pipelined inference using an obtained optimal pipeline plan improves DNN throughput by up to 1.78 times compared to the highest performing single device and DNNPipe achieves up to 98.26 % lower runtime overhead compared to PipeEdge, the fastest known optimal DNN partitioning algorithm.

查看原文本刊更多论文

DNNPipe：基于动态规划的最优DNN划分，用于物联网网络的流水线推理

管道并行化是一种有效的技术，可以在资源受限的物联网设备上有效执行深度神经网络（DNN）推理。为了实现具有非对称性能配置文件的计算节点之间的管道并行化，通过低延迟，高带宽网络相互连接，我们提出了DNNPipe，这是一种DNN划分算法，可以为给定的DNN构建管道计划。DNNPipe的主要目标是最大化DNN推理的吞吐量，同时最小化DNN分区的运行时开销，DNN分区在动态变化的物联网环境中反复在线执行。为了实现这一目标，DNNPipe使用动态规划（DP）和保持最优性的修剪技术来探索搜索空间，并找到最大阶段时间不大于任何其他可能管道计划的最优管道计划。具体来说，它使用两种修剪技术积极修剪次优管道计划：基于上限的修剪和未充分利用的阶段修剪。我们的实验结果表明，与性能最高的单个设备相比，使用获得的最优管道计划的管道推理可将DNN吞吐量提高1.78倍，并且与已知最快的最优DNN分区算法PipeEdge相比，DNNPipe的运行时开销可降低98.26%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Systems Architecture 工程技术-计算机：硬件

CiteScore

8.70

自引率

15.60%

发文量

226

审稿时长

46 days

期刊介绍： The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software. Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.