{"title":"无服务器边缘计算中的资源高效DNN推理","authors":"Xiaolin Guo;Fang Dong;Dian Shen;Zhaowu Huang;Jinghui Zhang","doi":"10.1109/TMC.2024.3514993","DOIUrl":null,"url":null,"abstract":"Serverless Edge Computing (SEC) has gained widespread adoption in improving resource utilization due to its triggered event-driven model. However, deploying deep neural network (DNN) inference services directly in SEC leads to resource inefficiencies, which stem from two key factors. First, existing methods adopt model-wise function encapsulation, which requires the entire DNN model to occupy memory throughout its execution lifecycle. This increases both memory footprint and occupancy time. Second, uniform DNN inference for diversity input leads to redundant computations and additional inference time. To this end, we propose REDI, a novel framework that leverages fine-grained block-wise function encapsulation and progressive inference to provide resource-efficient DNN inference while ensuring latency requirements. REDI enables the release of memory from already inferred shallow networks and allows each request to exit early based on input data complexity, eliminating redundant computations. To fully unleash the potential, REDI jointly considers resource heterogeneity, data diversity, and environment dynamics to investigate the block-wise function placement problem. We introduce an uncertainty-aware online learning-driven algorithm with bounded regret. Finally, we conduct extensive trace-driven experiments to evaluate our methods, demonstrating that REDI achieves a significant speedup of up to <inline-formula><tex-math>$6.52\\times$</tex-math></inline-formula> in terms of resource usage cost compared to state-of-the-art methods.","PeriodicalId":50389,"journal":{"name":"IEEE Transactions on Mobile Computing","volume":"24 5","pages":"3650-3666"},"PeriodicalIF":7.7000,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Resource-Efficient DNN Inference With Early Exiting in Serverless Edge Computing\",\"authors\":\"Xiaolin Guo;Fang Dong;Dian Shen;Zhaowu Huang;Jinghui Zhang\",\"doi\":\"10.1109/TMC.2024.3514993\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Serverless Edge Computing (SEC) has gained widespread adoption in improving resource utilization due to its triggered event-driven model. However, deploying deep neural network (DNN) inference services directly in SEC leads to resource inefficiencies, which stem from two key factors. First, existing methods adopt model-wise function encapsulation, which requires the entire DNN model to occupy memory throughout its execution lifecycle. This increases both memory footprint and occupancy time. Second, uniform DNN inference for diversity input leads to redundant computations and additional inference time. To this end, we propose REDI, a novel framework that leverages fine-grained block-wise function encapsulation and progressive inference to provide resource-efficient DNN inference while ensuring latency requirements. REDI enables the release of memory from already inferred shallow networks and allows each request to exit early based on input data complexity, eliminating redundant computations. To fully unleash the potential, REDI jointly considers resource heterogeneity, data diversity, and environment dynamics to investigate the block-wise function placement problem. We introduce an uncertainty-aware online learning-driven algorithm with bounded regret. Finally, we conduct extensive trace-driven experiments to evaluate our methods, demonstrating that REDI achieves a significant speedup of up to <inline-formula><tex-math>$6.52\\\\times$</tex-math></inline-formula> in terms of resource usage cost compared to state-of-the-art methods.\",\"PeriodicalId\":50389,\"journal\":{\"name\":\"IEEE Transactions on Mobile Computing\",\"volume\":\"24 5\",\"pages\":\"3650-3666\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2024-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Mobile Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10787262/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10787262/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Resource-Efficient DNN Inference With Early Exiting in Serverless Edge Computing
Serverless Edge Computing (SEC) has gained widespread adoption in improving resource utilization due to its triggered event-driven model. However, deploying deep neural network (DNN) inference services directly in SEC leads to resource inefficiencies, which stem from two key factors. First, existing methods adopt model-wise function encapsulation, which requires the entire DNN model to occupy memory throughout its execution lifecycle. This increases both memory footprint and occupancy time. Second, uniform DNN inference for diversity input leads to redundant computations and additional inference time. To this end, we propose REDI, a novel framework that leverages fine-grained block-wise function encapsulation and progressive inference to provide resource-efficient DNN inference while ensuring latency requirements. REDI enables the release of memory from already inferred shallow networks and allows each request to exit early based on input data complexity, eliminating redundant computations. To fully unleash the potential, REDI jointly considers resource heterogeneity, data diversity, and environment dynamics to investigate the block-wise function placement problem. We introduce an uncertainty-aware online learning-driven algorithm with bounded regret. Finally, we conduct extensive trace-driven experiments to evaluate our methods, demonstrating that REDI achieves a significant speedup of up to $6.52\times$ in terms of resource usage cost compared to state-of-the-art methods.
期刊介绍:
IEEE Transactions on Mobile Computing addresses key technical issues related to various aspects of mobile computing. This includes (a) architectures, (b) support services, (c) algorithm/protocol design and analysis, (d) mobile environments, (e) mobile communication systems, (f) applications, and (g) emerging technologies. Topics of interest span a wide range, covering aspects like mobile networks and hosts, mobility management, multimedia, operating system support, power management, online and mobile environments, security, scalability, reliability, and emerging technologies such as wearable computers, body area networks, and wireless sensor networks. The journal serves as a comprehensive platform for advancements in mobile computing research.