{"title":"AdaKnife:异构移动设备上用于推理加速的灵活DNN卸载","authors":"Sicong Liu;Hao Luo;XiaoChen Li;Yao Li;Bin Guo;Zhiwen Yu;YuZhan Wang;Ke Ma;YaSan Ding;Yuan Yao","doi":"10.1109/TMC.2024.3466931","DOIUrl":null,"url":null,"abstract":"The integration of deep neural network (DNN) intelligence into embedded mobile devices is expanding rapidly, supporting a wide range of applications. DNN compression techniques, which adapt models to resource-constrained mobile environments, often force a trade-off between efficiency and accuracy. Distributed DNN inference, leveraging multiple mobile devices, emerges as a promising alternative to enhance inference efficiency without compromising accuracy. However, effectively decoupling DNN models into fine-grained components for optimal parallel acceleration presents significant challenges. Current partitioning methods, including layer-level and operator or channel-level partitioning, provide only partial solutions and struggle with the heterogeneous nature of DNN compilation frameworks, complicating direct model offloading. In response, we introduce AdaKnife, an adaptive framework for accelerated inference across heterogeneous mobile devices. AdaKnife enables on-demand mixed-granularity DNN partitioning via computational graph analysis, facilitates efficient cross-framework model transitions with operator optimization for offloading, and improves the feasibility of parallel partitioning using a greedy operator parallelism algorithm. Our empirical studies show that AdaKnife achieves a 66.5% reduction in latency compared to baselines.","PeriodicalId":50389,"journal":{"name":"IEEE Transactions on Mobile Computing","volume":"24 2","pages":"736-748"},"PeriodicalIF":7.7000,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AdaKnife: Flexible DNN Offloading for Inference Acceleration on Heterogeneous Mobile Devices\",\"authors\":\"Sicong Liu;Hao Luo;XiaoChen Li;Yao Li;Bin Guo;Zhiwen Yu;YuZhan Wang;Ke Ma;YaSan Ding;Yuan Yao\",\"doi\":\"10.1109/TMC.2024.3466931\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The integration of deep neural network (DNN) intelligence into embedded mobile devices is expanding rapidly, supporting a wide range of applications. DNN compression techniques, which adapt models to resource-constrained mobile environments, often force a trade-off between efficiency and accuracy. Distributed DNN inference, leveraging multiple mobile devices, emerges as a promising alternative to enhance inference efficiency without compromising accuracy. However, effectively decoupling DNN models into fine-grained components for optimal parallel acceleration presents significant challenges. Current partitioning methods, including layer-level and operator or channel-level partitioning, provide only partial solutions and struggle with the heterogeneous nature of DNN compilation frameworks, complicating direct model offloading. In response, we introduce AdaKnife, an adaptive framework for accelerated inference across heterogeneous mobile devices. AdaKnife enables on-demand mixed-granularity DNN partitioning via computational graph analysis, facilitates efficient cross-framework model transitions with operator optimization for offloading, and improves the feasibility of parallel partitioning using a greedy operator parallelism algorithm. Our empirical studies show that AdaKnife achieves a 66.5% reduction in latency compared to baselines.\",\"PeriodicalId\":50389,\"journal\":{\"name\":\"IEEE Transactions on Mobile Computing\",\"volume\":\"24 2\",\"pages\":\"736-748\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2024-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Mobile Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10700984/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10700984/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
AdaKnife: Flexible DNN Offloading for Inference Acceleration on Heterogeneous Mobile Devices
The integration of deep neural network (DNN) intelligence into embedded mobile devices is expanding rapidly, supporting a wide range of applications. DNN compression techniques, which adapt models to resource-constrained mobile environments, often force a trade-off between efficiency and accuracy. Distributed DNN inference, leveraging multiple mobile devices, emerges as a promising alternative to enhance inference efficiency without compromising accuracy. However, effectively decoupling DNN models into fine-grained components for optimal parallel acceleration presents significant challenges. Current partitioning methods, including layer-level and operator or channel-level partitioning, provide only partial solutions and struggle with the heterogeneous nature of DNN compilation frameworks, complicating direct model offloading. In response, we introduce AdaKnife, an adaptive framework for accelerated inference across heterogeneous mobile devices. AdaKnife enables on-demand mixed-granularity DNN partitioning via computational graph analysis, facilitates efficient cross-framework model transitions with operator optimization for offloading, and improves the feasibility of parallel partitioning using a greedy operator parallelism algorithm. Our empirical studies show that AdaKnife achieves a 66.5% reduction in latency compared to baselines.
期刊介绍:
IEEE Transactions on Mobile Computing addresses key technical issues related to various aspects of mobile computing. This includes (a) architectures, (b) support services, (c) algorithm/protocol design and analysis, (d) mobile environments, (e) mobile communication systems, (f) applications, and (g) emerging technologies. Topics of interest span a wide range, covering aspects like mobile networks and hosts, mobility management, multimedia, operating system support, power management, online and mobile environments, security, scalability, reliability, and emerging technologies such as wearable computers, body area networks, and wireless sensor networks. The journal serves as a comprehensive platform for advancements in mobile computing research.