ProKube: Proactive Kubernetes Orchestrator for Inference in Heterogeneous Edge Computing

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

International Journal of Network Management Pub Date : 2024-08-18 DOI:10.1002/nem.2298

Babar Ali, Muhammed Golec, Sukhpal Singh Gill, Felix Cuadrado, Steve Uhlig

{"title":"ProKube: Proactive Kubernetes Orchestrator for Inference in Heterogeneous Edge Computing","authors":"Babar Ali, Muhammed Golec, Sukhpal Singh Gill, Felix Cuadrado, Steve Uhlig","doi":"10.1002/nem.2298","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Deep neural network (DNN) and machine learning (ML) models/ inferences produce highly accurate results demanding enormous computational resources. The limited capacity of end-user smart gadgets drives companies to exploit computational resources in an edge-to-cloud continuum and host applications at user-facing locations with users requiring fast responses. Kubernetes hosted inferences with poor resource request estimation results in service level agreement (SLA) violation in terms of latency and below par performance with higher end-to-end (E2E) delays. Lifetime static resource provisioning either hurts user experience for under-resource provisioning or incurs cost with over-provisioning. Dynamic scaling offers to remedy delay by upscaling leading to additional cost whereas a simple migration to another location offering latency in SLA bounds can reduce delay and minimize cost. To address this cost and delay challenges for ML inferences in the inherent heterogeneous, resource-constrained, and distributed edge environment, we propose ProKube, which is a proactive container scaling and migration orchestrator to dynamically adjust the resources and container locations with a fair balance between cost and delay. ProKube is developed in conjunction with Google Kubernetes Engine (GKE) enabling cross-cluster migration and/ or dynamic scaling. It further supports the regular addition of freshly collected logs into scheduling decisions to handle unpredictable network behavior. Experiments conducted in heterogeneous edge settings show the efficacy of ProKube to its counterparts cost greedy (CG), latency greedy (LG), and GeKube (GK). ProKube offers 68%, 7%, and 64% SLA violation reduction to CG, LG, and GK, respectively, and it improves cost by 4.77 cores to LG and offers more cost of 3.94 to CG and GK.</p>\n </div>","PeriodicalId":14154,"journal":{"name":"International Journal of Network Management","volume":"35 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Network Management","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/nem.2298","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Deep neural network (DNN) and machine learning (ML) models/ inferences produce highly accurate results demanding enormous computational resources. The limited capacity of end-user smart gadgets drives companies to exploit computational resources in an edge-to-cloud continuum and host applications at user-facing locations with users requiring fast responses. Kubernetes hosted inferences with poor resource request estimation results in service level agreement (SLA) violation in terms of latency and below par performance with higher end-to-end (E2E) delays. Lifetime static resource provisioning either hurts user experience for under-resource provisioning or incurs cost with over-provisioning. Dynamic scaling offers to remedy delay by upscaling leading to additional cost whereas a simple migration to another location offering latency in SLA bounds can reduce delay and minimize cost. To address this cost and delay challenges for ML inferences in the inherent heterogeneous, resource-constrained, and distributed edge environment, we propose ProKube, which is a proactive container scaling and migration orchestrator to dynamically adjust the resources and container locations with a fair balance between cost and delay. ProKube is developed in conjunction with Google Kubernetes Engine (GKE) enabling cross-cluster migration and/ or dynamic scaling. It further supports the regular addition of freshly collected logs into scheduling decisions to handle unpredictable network behavior. Experiments conducted in heterogeneous edge settings show the efficacy of ProKube to its counterparts cost greedy (CG), latency greedy (LG), and GeKube (GK). ProKube offers 68%, 7%, and 64% SLA violation reduction to CG, LG, and GK, respectively, and it improves cost by 4.77 cores to LG and offers more cost of 3.94 to CG and GK.

Abstract Image

查看原文本刊更多论文

ProKube：用于异构边缘计算推理的主动式 Kubernetes 协调器

深度神经网络（DNN）和机器学习（ML）模型/推断会产生高度精确的结果，需要大量的计算资源。终端用户智能小工具的容量有限，这促使公司在从边缘到云的连续过程中开发计算资源，并在面向用户的位置托管应用程序，以满足用户对快速响应的要求。Kubernetes 托管推论的资源请求估算能力较差，导致服务水平协议（SLA）遭到违反，表现为延迟和低于标准的性能，端到端（E2E）延迟较高。终身静态资源配置要么会因资源配置不足而损害用户体验，要么会因资源配置过多而产生成本。动态扩展可通过上调规模来弥补延迟，但这会导致额外的成本，而简单地迁移到另一个位置，在服务水平协议（SLA）范围内提供延迟，则可减少延迟并最大限度地降低成本。为了解决在固有的异构、资源受限和分布式边缘环境中进行 ML 推断所面临的成本和延迟挑战，我们提出了 ProKube，它是一种主动式容器扩展和迁移协调器，可动态调整资源和容器位置，在成本和延迟之间取得合理平衡。ProKube 是与谷歌 Kubernetes 引擎（GKE）联合开发的，可实现跨集群迁移和/或动态扩展。它还支持在调度决策中定期添加最新收集的日志，以处理不可预测的网络行为。在异构边缘设置中进行的实验表明，ProKube 的功效优于其同类产品成本贪婪（CG）、延迟贪婪（LG）和 GeKube（GK）。ProKube 比 CG、LG 和 GK 分别减少了 68%、7% 和 64% 的 SLA 违反率，比 LG 提高了 4.77 个内核的成本，比 CG 和 GK 提高了 3.94 个内核的成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Network Management COMPUTER SCIENCE, INFORMATION SYSTEMS-TELECOMMUNICATIONS

CiteScore

5.10

自引率

6.70%

发文量

审稿时长

>12 weeks

期刊介绍： Modern computer networks and communication systems are increasing in size, scope, and heterogeneity. The promise of a single end-to-end technology has not been realized and likely never will occur. The decreasing cost of bandwidth is increasing the possible applications of computer networks and communication systems to entirely new domains. Problems in integrating heterogeneous wired and wireless technologies, ensuring security and quality of service, and reliably operating large-scale systems including the inclusion of cloud computing have all emerged as important topics. The one constant is the need for network management. Challenges in network management have never been greater than they are today. The International Journal of Network Management is the forum for researchers, developers, and practitioners in network management to present their work to an international audience. The journal is dedicated to the dissemination of information, which will enable improved management, operation, and maintenance of computer networks and communication systems. The journal is peer reviewed and publishes original papers (both theoretical and experimental) by leading researchers, practitioners, and consultants from universities, research laboratories, and companies around the world. Issues with thematic or guest-edited special topics typically occur several times per year. Topic areas for the journal are largely defined by the taxonomy for network and service management developed by IFIP WG6.6, together with IEEE-CNOM, the IRTF-NMRG and the Emanics Network of Excellence.