Feng Zhang;Chenyang Zhang;Jiawei Guan;Qiangjun Zhou;Kuangyu Chen;Xiao Zhang;Bingsheng He;Jidong Zhai;Xiaoyong Du
{"title":"突破边缘:在集成边缘设备上实现高效神经网络推理","authors":"Feng Zhang;Chenyang Zhang;Jiawei Guan;Qiangjun Zhou;Kuangyu Chen;Xiao Zhang;Bingsheng He;Jidong Zhai;Xiaoyong Du","doi":"10.1109/TCC.2025.3559346","DOIUrl":null,"url":null,"abstract":"Edge computing has gained widespread attention in cloud computing due to the increasing demands of AIoT applications and the evolution of edge architectures. One prevalent application in this domain is neural network inference on edge for computing and processing. This article presents an in-depth exploration of inference on integrated edge devices and introduces EdgeNN, a groundbreaking solution for inference specifically designed for CPU-GPU integrated edge devices. EdgeNN offers three key innovations. First, EdgeNN adaptively employs <italic>zero-copy</i> optimization by harnessing unified physical memory. Second, EdgeNN introduces an innovative approach to CPU-GPU hybrid execution tailored for inference tasks. This technique enables concurrent CPU and GPU operation, effectively leveraging edge platforms’ computational capabilities. Third, EdgeNN adopts a finely tuned adaptive inference tuning technique that analyzes complex inference structures. It divides computations into sub-tasks, intelligently assigning them to the two processors for better performance. Experimental results demonstrate EdgeNN's superiority across six popular neural network inference processing. EdgeNN delivers average speed improvements of 3.97×, 4.10×, 3.12×, and 8.80× when compared to inference on four distinct edge CPUs. Furthermore, EdgeNN achieves significant time advantages compared to the direct execution of original programs. This improvement is attributed to better unified memory utilization (44.37%) and the innovative CPU-GPU hybrid execution approach (17.91%). Additionally, EdgeNN exhibits superior energy efficiency, providing 29.14× higher energy efficiency than edge CPUs and 5.70× higher energy efficiency than discrete GPUs. EdgeNN is now open source at <uri>https://github.com/ChenyangZhang-cs/EdgeNN</uri>.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"13 2","pages":"694-710"},"PeriodicalIF":5.3000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Breaking the Edge: Enabling Efficient Neural Network Inference on Integrated Edge Devices\",\"authors\":\"Feng Zhang;Chenyang Zhang;Jiawei Guan;Qiangjun Zhou;Kuangyu Chen;Xiao Zhang;Bingsheng He;Jidong Zhai;Xiaoyong Du\",\"doi\":\"10.1109/TCC.2025.3559346\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Edge computing has gained widespread attention in cloud computing due to the increasing demands of AIoT applications and the evolution of edge architectures. One prevalent application in this domain is neural network inference on edge for computing and processing. This article presents an in-depth exploration of inference on integrated edge devices and introduces EdgeNN, a groundbreaking solution for inference specifically designed for CPU-GPU integrated edge devices. EdgeNN offers three key innovations. First, EdgeNN adaptively employs <italic>zero-copy</i> optimization by harnessing unified physical memory. Second, EdgeNN introduces an innovative approach to CPU-GPU hybrid execution tailored for inference tasks. This technique enables concurrent CPU and GPU operation, effectively leveraging edge platforms’ computational capabilities. Third, EdgeNN adopts a finely tuned adaptive inference tuning technique that analyzes complex inference structures. It divides computations into sub-tasks, intelligently assigning them to the two processors for better performance. Experimental results demonstrate EdgeNN's superiority across six popular neural network inference processing. EdgeNN delivers average speed improvements of 3.97×, 4.10×, 3.12×, and 8.80× when compared to inference on four distinct edge CPUs. Furthermore, EdgeNN achieves significant time advantages compared to the direct execution of original programs. This improvement is attributed to better unified memory utilization (44.37%) and the innovative CPU-GPU hybrid execution approach (17.91%). Additionally, EdgeNN exhibits superior energy efficiency, providing 29.14× higher energy efficiency than edge CPUs and 5.70× higher energy efficiency than discrete GPUs. EdgeNN is now open source at <uri>https://github.com/ChenyangZhang-cs/EdgeNN</uri>.\",\"PeriodicalId\":13202,\"journal\":{\"name\":\"IEEE Transactions on Cloud Computing\",\"volume\":\"13 2\",\"pages\":\"694-710\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Cloud Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10959707/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cloud Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10959707/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Breaking the Edge: Enabling Efficient Neural Network Inference on Integrated Edge Devices
Edge computing has gained widespread attention in cloud computing due to the increasing demands of AIoT applications and the evolution of edge architectures. One prevalent application in this domain is neural network inference on edge for computing and processing. This article presents an in-depth exploration of inference on integrated edge devices and introduces EdgeNN, a groundbreaking solution for inference specifically designed for CPU-GPU integrated edge devices. EdgeNN offers three key innovations. First, EdgeNN adaptively employs zero-copy optimization by harnessing unified physical memory. Second, EdgeNN introduces an innovative approach to CPU-GPU hybrid execution tailored for inference tasks. This technique enables concurrent CPU and GPU operation, effectively leveraging edge platforms’ computational capabilities. Third, EdgeNN adopts a finely tuned adaptive inference tuning technique that analyzes complex inference structures. It divides computations into sub-tasks, intelligently assigning them to the two processors for better performance. Experimental results demonstrate EdgeNN's superiority across six popular neural network inference processing. EdgeNN delivers average speed improvements of 3.97×, 4.10×, 3.12×, and 8.80× when compared to inference on four distinct edge CPUs. Furthermore, EdgeNN achieves significant time advantages compared to the direct execution of original programs. This improvement is attributed to better unified memory utilization (44.37%) and the innovative CPU-GPU hybrid execution approach (17.91%). Additionally, EdgeNN exhibits superior energy efficiency, providing 29.14× higher energy efficiency than edge CPUs and 5.70× higher energy efficiency than discrete GPUs. EdgeNN is now open source at https://github.com/ChenyangZhang-cs/EdgeNN.
期刊介绍:
The IEEE Transactions on Cloud Computing (TCC) is dedicated to the multidisciplinary field of cloud computing. It is committed to the publication of articles that present innovative research ideas, application results, and case studies in cloud computing, focusing on key technical issues related to theory, algorithms, systems, applications, and performance.