Breaking the Edge: Enabling Efficient Neural Network Inference on Integrated Edge Devices

IF 5.3 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Cloud Computing Pub Date : 2025-04-09 DOI:10.1109/TCC.2025.3559346

Feng Zhang;Chenyang Zhang;Jiawei Guan;Qiangjun Zhou;Kuangyu Chen;Xiao Zhang;Bingsheng He;Jidong Zhai;Xiaoyong Du

{"title":"Breaking the Edge: Enabling Efficient Neural Network Inference on Integrated Edge Devices","authors":"Feng Zhang;Chenyang Zhang;Jiawei Guan;Qiangjun Zhou;Kuangyu Chen;Xiao Zhang;Bingsheng He;Jidong Zhai;Xiaoyong Du","doi":"10.1109/TCC.2025.3559346","DOIUrl":null,"url":null,"abstract":"Edge computing has gained widespread attention in cloud computing due to the increasing demands of AIoT applications and the evolution of edge architectures. One prevalent application in this domain is neural network inference on edge for computing and processing. This article presents an in-depth exploration of inference on integrated edge devices and introduces EdgeNN, a groundbreaking solution for inference specifically designed for CPU-GPU integrated edge devices. EdgeNN offers three key innovations. First, EdgeNN adaptively employs <italic>zero-copy</i> optimization by harnessing unified physical memory. Second, EdgeNN introduces an innovative approach to CPU-GPU hybrid execution tailored for inference tasks. This technique enables concurrent CPU and GPU operation, effectively leveraging edge platforms’ computational capabilities. Third, EdgeNN adopts a finely tuned adaptive inference tuning technique that analyzes complex inference structures. It divides computations into sub-tasks, intelligently assigning them to the two processors for better performance. Experimental results demonstrate EdgeNN's superiority across six popular neural network inference processing. EdgeNN delivers average speed improvements of 3.97×, 4.10×, 3.12×, and 8.80× when compared to inference on four distinct edge CPUs. Furthermore, EdgeNN achieves significant time advantages compared to the direct execution of original programs. This improvement is attributed to better unified memory utilization (44.37%) and the innovative CPU-GPU hybrid execution approach (17.91%). Additionally, EdgeNN exhibits superior energy efficiency, providing 29.14× higher energy efficiency than edge CPUs and 5.70× higher energy efficiency than discrete GPUs. EdgeNN is now open source at <uri>https://github.com/ChenyangZhang-cs/EdgeNN</uri>.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"13 2","pages":"694-710"},"PeriodicalIF":5.3000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cloud Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10959707/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Edge computing has gained widespread attention in cloud computing due to the increasing demands of AIoT applications and the evolution of edge architectures. One prevalent application in this domain is neural network inference on edge for computing and processing. This article presents an in-depth exploration of inference on integrated edge devices and introduces EdgeNN, a groundbreaking solution for inference specifically designed for CPU-GPU integrated edge devices. EdgeNN offers three key innovations. First, EdgeNN adaptively employs zero-copy optimization by harnessing unified physical memory. Second, EdgeNN introduces an innovative approach to CPU-GPU hybrid execution tailored for inference tasks. This technique enables concurrent CPU and GPU operation, effectively leveraging edge platforms’ computational capabilities. Third, EdgeNN adopts a finely tuned adaptive inference tuning technique that analyzes complex inference structures. It divides computations into sub-tasks, intelligently assigning them to the two processors for better performance. Experimental results demonstrate EdgeNN's superiority across six popular neural network inference processing. EdgeNN delivers average speed improvements of 3.97×, 4.10×, 3.12×, and 8.80× when compared to inference on four distinct edge CPUs. Furthermore, EdgeNN achieves significant time advantages compared to the direct execution of original programs. This improvement is attributed to better unified memory utilization (44.37%) and the innovative CPU-GPU hybrid execution approach (17.91%). Additionally, EdgeNN exhibits superior energy efficiency, providing 29.14× higher energy efficiency than edge CPUs and 5.70× higher energy efficiency than discrete GPUs. EdgeNN is now open source at https://github.com/ChenyangZhang-cs/EdgeNN.

查看原文本刊更多论文

突破边缘：在集成边缘设备上实现高效神经网络推理

由于AIoT应用需求的增加和边缘架构的发展，边缘计算在云计算中得到了广泛的关注。在边缘计算和处理方面的神经网络推理是该领域的一个普遍应用。本文对集成边缘设备上的推理进行了深入的探索，并介绍了专为CPU-GPU集成边缘设备设计的突破性推理解决方案EdgeNN。EdgeNN提供了三个关键创新。首先，EdgeNN通过利用统一的物理内存自适应地采用零拷贝优化。其次，EdgeNN引入了一种创新的CPU-GPU混合执行方法，为推理任务量身定制。该技术支持并发CPU和GPU操作，有效利用边缘平台的计算能力。第三，EdgeNN采用精细自适应推理调优技术，分析复杂的推理结构。它将计算划分为子任务，智能地将它们分配给两个处理器以获得更好的性能。实验结果表明，EdgeNN在六种流行的神经网络推理处理中具有优势。与四个不同边缘cpu的推理相比，EdgeNN的平均速度提高了3.97倍、4.10倍、3.12倍和8.80倍。此外，与直接执行原始程序相比，EdgeNN具有显著的时间优势。这种改进归功于更好的统一内存利用率（44.37%）和创新的CPU-GPU混合执行方法（17.91%）。此外，EdgeNN具有卓越的能效，比边缘cpu的能效高29.14倍，比分立gpu的能效高5.70倍。EdgeNN现在是开源的，网址是https://github.com/ChenyangZhang-cs/EdgeNN。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Cloud Computing Computer Science-Software

CiteScore

9.40

自引率

6.20%

发文量

167

期刊介绍： The IEEE Transactions on Cloud Computing (TCC) is dedicated to the multidisciplinary field of cloud computing. It is committed to the publication of articles that present innovative research ideas, application results, and case studies in cloud computing, focusing on key technical issues related to theory, algorithms, systems, applications, and performance.