无服务器边缘计算中的资源高效DNN推理

IF 7.7 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Mobile Computing Pub Date : 2024-12-11 DOI:10.1109/TMC.2024.3514993

Xiaolin Guo;Fang Dong;Dian Shen;Zhaowu Huang;Jinghui Zhang

{"title":"无服务器边缘计算中的资源高效DNN推理","authors":"Xiaolin Guo;Fang Dong;Dian Shen;Zhaowu Huang;Jinghui Zhang","doi":"10.1109/TMC.2024.3514993","DOIUrl":null,"url":null,"abstract":"Serverless Edge Computing (SEC) has gained widespread adoption in improving resource utilization due to its triggered event-driven model. However, deploying deep neural network (DNN) inference services directly in SEC leads to resource inefficiencies, which stem from two key factors. First, existing methods adopt model-wise function encapsulation, which requires the entire DNN model to occupy memory throughout its execution lifecycle. This increases both memory footprint and occupancy time. Second, uniform DNN inference for diversity input leads to redundant computations and additional inference time. To this end, we propose REDI, a novel framework that leverages fine-grained block-wise function encapsulation and progressive inference to provide resource-efficient DNN inference while ensuring latency requirements. REDI enables the release of memory from already inferred shallow networks and allows each request to exit early based on input data complexity, eliminating redundant computations. To fully unleash the potential, REDI jointly considers resource heterogeneity, data diversity, and environment dynamics to investigate the block-wise function placement problem. We introduce an uncertainty-aware online learning-driven algorithm with bounded regret. Finally, we conduct extensive trace-driven experiments to evaluate our methods, demonstrating that REDI achieves a significant speedup of up to <inline-formula><tex-math>$6.52\\times$</tex-math></inline-formula> in terms of resource usage cost compared to state-of-the-art methods.","PeriodicalId":50389,"journal":{"name":"IEEE Transactions on Mobile Computing","volume":"24 5","pages":"3650-3666"},"PeriodicalIF":7.7000,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Resource-Efficient DNN Inference With Early Exiting in Serverless Edge Computing\",\"authors\":\"Xiaolin Guo;Fang Dong;Dian Shen;Zhaowu Huang;Jinghui Zhang\",\"doi\":\"10.1109/TMC.2024.3514993\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Serverless Edge Computing (SEC) has gained widespread adoption in improving resource utilization due to its triggered event-driven model. However, deploying deep neural network (DNN) inference services directly in SEC leads to resource inefficiencies, which stem from two key factors. First, existing methods adopt model-wise function encapsulation, which requires the entire DNN model to occupy memory throughout its execution lifecycle. This increases both memory footprint and occupancy time. Second, uniform DNN inference for diversity input leads to redundant computations and additional inference time. To this end, we propose REDI, a novel framework that leverages fine-grained block-wise function encapsulation and progressive inference to provide resource-efficient DNN inference while ensuring latency requirements. REDI enables the release of memory from already inferred shallow networks and allows each request to exit early based on input data complexity, eliminating redundant computations. To fully unleash the potential, REDI jointly considers resource heterogeneity, data diversity, and environment dynamics to investigate the block-wise function placement problem. We introduce an uncertainty-aware online learning-driven algorithm with bounded regret. Finally, we conduct extensive trace-driven experiments to evaluate our methods, demonstrating that REDI achieves a significant speedup of up to <inline-formula><tex-math>$6.52\\\\times$</tex-math></inline-formula> in terms of resource usage cost compared to state-of-the-art methods.\",\"PeriodicalId\":50389,\"journal\":{\"name\":\"IEEE Transactions on Mobile Computing\",\"volume\":\"24 5\",\"pages\":\"3650-3666\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2024-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Mobile Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10787262/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10787262/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

无服务器边缘计算（SEC）因其触发式事件驱动模型而在提高资源利用率方面获得了广泛应用。然而，直接在 SEC 中部署深度神经网络（DNN）推理服务会导致资源利用效率低下，这主要源于两个关键因素。首先，现有方法采用模型化函数封装，这就要求整个 DNN 模型在整个执行生命周期内占用内存。这增加了内存占用和占用时间。其次，针对多样性输入的统一 DNN 推理会导致冗余计算和额外的推理时间。为此，我们提出了 REDI，这是一种新型框架，它利用细粒度的分块函数封装和渐进推理来提供资源节约型 DNN 推理，同时确保延迟要求。REDI 能够释放已推理出的浅层网络的内存，并允许每个请求根据输入数据的复杂性提前退出，从而消除冗余计算。为了充分释放潜力，REDI 联合考虑了资源异质性、数据多样性和环境动态，以研究分块函数放置问题。我们引入了一种具有有界遗憾的不确定性感知在线学习驱动算法。最后，我们进行了广泛的跟踪实验来评估我们的方法，结果表明，与最先进的方法相比，REDI 在资源使用成本方面实现了高达 6.52 美元/次的显著提速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Resource-Efficient DNN Inference With Early Exiting in Serverless Edge Computing

Serverless Edge Computing (SEC) has gained widespread adoption in improving resource utilization due to its triggered event-driven model. However, deploying deep neural network (DNN) inference services directly in SEC leads to resource inefficiencies, which stem from two key factors. First, existing methods adopt model-wise function encapsulation, which requires the entire DNN model to occupy memory throughout its execution lifecycle. This increases both memory footprint and occupancy time. Second, uniform DNN inference for diversity input leads to redundant computations and additional inference time. To this end, we propose REDI, a novel framework that leverages fine-grained block-wise function encapsulation and progressive inference to provide resource-efficient DNN inference while ensuring latency requirements. REDI enables the release of memory from already inferred shallow networks and allows each request to exit early based on input data complexity, eliminating redundant computations. To fully unleash the potential, REDI jointly considers resource heterogeneity, data diversity, and environment dynamics to investigate the block-wise function placement problem. We introduce an uncertainty-aware online learning-driven algorithm with bounded regret. Finally, we conduct extensive trace-driven experiments to evaluate our methods, demonstrating that REDI achieves a significant speedup of up to

$6.52\times$

in terms of resource usage cost compared to state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Mobile Computing 工程技术-电信学

CiteScore

12.90

自引率

2.50%

发文量

403

审稿时长

6.6 months

期刊介绍： IEEE Transactions on Mobile Computing addresses key technical issues related to various aspects of mobile computing. This includes (a) architectures, (b) support services, (c) algorithm/protocol design and analysis, (d) mobile environments, (e) mobile communication systems, (f) applications, and (g) emerging technologies. Topics of interest span a wide range, covering aspects like mobile networks and hosts, mobility management, multimedia, operating system support, power management, online and mobile environments, security, scalability, reliability, and emerging technologies such as wearable computers, body area networks, and wireless sensor networks. The journal serves as a comprehensive platform for advancements in mobile computing research.