Edge-MultiAI: Multi-Tenancy of Latency-Sensitive Deep Learning Applications on Edge

2022 IEEE/ACM 15th International Conference on Utility and Cloud Computing (UCC) Pub Date : 2022-11-14 DOI:10.1109/UCC56403.2022.00012

S. Zobaed, Ali Mokhtari, J. Champati, M. Kourouma, M. Salehi

{"title":"Edge-MultiAI: Multi-Tenancy of Latency-Sensitive Deep Learning Applications on Edge","authors":"S. Zobaed, Ali Mokhtari, J. Champati, M. Kourouma, M. Salehi","doi":"10.1109/UCC56403.2022.00012","DOIUrl":null,"url":null,"abstract":"Smart IoT-based systems often desire continuous execution of multiple latency-sensitive Deep Learning (DL) applications. The edge servers serve as the cornerstone of such IoT based systems, however, their resource limitations hamper the continuous execution of multiple (multi-tenant) DL applications. The challenge is that, DL applications function based on bulky “neural network (NN) models” that cannot be simultaneously maintained in the limited memory space of the edge. Accordingly, the main contribution of this research is to overcome the memory contention challenge, thereby, meeting the latency constraints of the DL applications without compromising their inference accuracy. We propose an efficient NN model management framework, called Edge-MultiAI, that ushers the NN models of the DL applications into the edge memory such that the degree of multi-tenancy and the number of warm-starts are maximized. Edge-MultiAI leverages NN model compression techniques, such as model quantization, and dynamically loads NN models for DL applications to stimulate multi-tenancy on the edge server. We also devise a model management heuristic for Edge-MultiAI, called iWS-BFE, that functions based on the Bayesian theory to predict the inference requests for multi-tenant applications, and uses it to choose the appropriate NN models for loading, hence, increasing the number of warm-start inferences. We evaluate the efficacy and robustness of Edge-MultiAI under various configurations. The results reveal that Edge-MultiAI can stimulate the degree of multi-tenancy on the edge by at least 2× and increase the number of warm-starts by ≈60% without any major loss on the inference accuracy of the applications.","PeriodicalId":203244,"journal":{"name":"2022 IEEE/ACM 15th International Conference on Utility and Cloud Computing (UCC)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 15th International Conference on Utility and Cloud Computing (UCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UCC56403.2022.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Smart IoT-based systems often desire continuous execution of multiple latency-sensitive Deep Learning (DL) applications. The edge servers serve as the cornerstone of such IoT based systems, however, their resource limitations hamper the continuous execution of multiple (multi-tenant) DL applications. The challenge is that, DL applications function based on bulky “neural network (NN) models” that cannot be simultaneously maintained in the limited memory space of the edge. Accordingly, the main contribution of this research is to overcome the memory contention challenge, thereby, meeting the latency constraints of the DL applications without compromising their inference accuracy. We propose an efficient NN model management framework, called Edge-MultiAI, that ushers the NN models of the DL applications into the edge memory such that the degree of multi-tenancy and the number of warm-starts are maximized. Edge-MultiAI leverages NN model compression techniques, such as model quantization, and dynamically loads NN models for DL applications to stimulate multi-tenancy on the edge server. We also devise a model management heuristic for Edge-MultiAI, called iWS-BFE, that functions based on the Bayesian theory to predict the inference requests for multi-tenant applications, and uses it to choose the appropriate NN models for loading, hence, increasing the number of warm-start inferences. We evaluate the efficacy and robustness of Edge-MultiAI under various configurations. The results reveal that Edge-MultiAI can stimulate the degree of multi-tenancy on the edge by at least 2× and increase the number of warm-starts by ≈60% without any major loss on the inference accuracy of the applications.

查看原文本刊更多论文

Edge- multiai:对延迟敏感的边缘深度学习应用的多租户

基于物联网的智能系统通常需要连续执行多个延迟敏感的深度学习(DL)应用程序。边缘服务器作为这种基于物联网的系统的基石，然而，它们的资源限制阻碍了多个(多租户)DL应用程序的持续执行。挑战在于，深度学习应用程序基于庞大的“神经网络(NN)模型”，无法在有限的边缘内存空间中同时维护。因此，本研究的主要贡献是克服了内存争用的挑战，从而在不影响其推理准确性的情况下满足深度学习应用程序的延迟约束。我们提出了一个高效的神经网络模型管理框架，称为edge - multiai，它将深度学习应用程序的神经网络模型引入边缘存储器，从而最大化多租户的程度和热启动的数量。edge - multiai利用神经网络模型压缩技术，如模型量化，并动态加载深度学习应用的神经网络模型，以刺激边缘服务器上的多租户。我们还为Edge-MultiAI设计了一个模型管理启发式算法，称为iWS-BFE，它基于贝叶斯理论预测多租户应用程序的推理请求，并使用它来选择合适的神经网络模型进行加载，从而增加热启动推理的数量。我们评估了Edge-MultiAI在不同配置下的有效性和鲁棒性。结果表明，edge - multiai可以将边缘上的多租户程度至少提高2倍，将热启动次数增加约60%，而不会对应用程序的推理精度造成重大损失。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE/ACM 15th International Conference on Utility and Cloud Computing (UCC)

自引率

0.00%

发文量