ProKube: Proactive Kubernetes Orchestrator for Inference in Heterogeneous Edge Computing

IF 1.5 4区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Babar Ali, Muhammed Golec, Sukhpal Singh Gill, Felix Cuadrado, Steve Uhlig
{"title":"ProKube: Proactive Kubernetes Orchestrator for Inference in Heterogeneous Edge Computing","authors":"Babar Ali,&nbsp;Muhammed Golec,&nbsp;Sukhpal Singh Gill,&nbsp;Felix Cuadrado,&nbsp;Steve Uhlig","doi":"10.1002/nem.2298","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Deep neural network (DNN) and machine learning (ML) models/ inferences produce highly accurate results demanding enormous computational resources. The limited capacity of end-user smart gadgets drives companies to exploit computational resources in an edge-to-cloud continuum and host applications at user-facing locations with users requiring fast responses. Kubernetes hosted inferences with poor resource request estimation results in service level agreement (SLA) violation in terms of latency and below par performance with higher end-to-end (E2E) delays. Lifetime static resource provisioning either hurts user experience for under-resource provisioning or incurs cost with over-provisioning. Dynamic scaling offers to remedy delay by upscaling leading to additional cost whereas a simple migration to another location offering latency in SLA bounds can reduce delay and minimize cost. To address this cost and delay challenges for ML inferences in the inherent heterogeneous, resource-constrained, and distributed edge environment, we propose ProKube, which is a proactive container scaling and migration orchestrator to dynamically adjust the resources and container locations with a fair balance between cost and delay. ProKube is developed in conjunction with Google Kubernetes Engine (GKE) enabling cross-cluster migration and/ or dynamic scaling. It further supports the regular addition of freshly collected logs into scheduling decisions to handle unpredictable network behavior. Experiments conducted in heterogeneous edge settings show the efficacy of ProKube to its counterparts cost greedy (CG), latency greedy (LG), and GeKube (GK). ProKube offers 68%, 7%, and 64% SLA violation reduction to CG, LG, and GK, respectively, and it improves cost by 4.77 cores to LG and offers more cost of 3.94 to CG and GK.</p>\n </div>","PeriodicalId":14154,"journal":{"name":"International Journal of Network Management","volume":"35 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Network Management","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/nem.2298","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Deep neural network (DNN) and machine learning (ML) models/ inferences produce highly accurate results demanding enormous computational resources. The limited capacity of end-user smart gadgets drives companies to exploit computational resources in an edge-to-cloud continuum and host applications at user-facing locations with users requiring fast responses. Kubernetes hosted inferences with poor resource request estimation results in service level agreement (SLA) violation in terms of latency and below par performance with higher end-to-end (E2E) delays. Lifetime static resource provisioning either hurts user experience for under-resource provisioning or incurs cost with over-provisioning. Dynamic scaling offers to remedy delay by upscaling leading to additional cost whereas a simple migration to another location offering latency in SLA bounds can reduce delay and minimize cost. To address this cost and delay challenges for ML inferences in the inherent heterogeneous, resource-constrained, and distributed edge environment, we propose ProKube, which is a proactive container scaling and migration orchestrator to dynamically adjust the resources and container locations with a fair balance between cost and delay. ProKube is developed in conjunction with Google Kubernetes Engine (GKE) enabling cross-cluster migration and/ or dynamic scaling. It further supports the regular addition of freshly collected logs into scheduling decisions to handle unpredictable network behavior. Experiments conducted in heterogeneous edge settings show the efficacy of ProKube to its counterparts cost greedy (CG), latency greedy (LG), and GeKube (GK). ProKube offers 68%, 7%, and 64% SLA violation reduction to CG, LG, and GK, respectively, and it improves cost by 4.77 cores to LG and offers more cost of 3.94 to CG and GK.

Abstract Image

ProKube:用于异构边缘计算推理的主动式 Kubernetes 协调器
深度神经网络(DNN)和机器学习(ML)模型/推断会产生高度精确的结果,需要大量的计算资源。终端用户智能小工具的容量有限,这促使公司在从边缘到云的连续过程中开发计算资源,并在面向用户的位置托管应用程序,以满足用户对快速响应的要求。Kubernetes 托管推论的资源请求估算能力较差,导致服务水平协议(SLA)遭到违反,表现为延迟和低于标准的性能,端到端(E2E)延迟较高。终身静态资源配置要么会因资源配置不足而损害用户体验,要么会因资源配置过多而产生成本。动态扩展可通过上调规模来弥补延迟,但这会导致额外的成本,而简单地迁移到另一个位置,在服务水平协议(SLA)范围内提供延迟,则可减少延迟并最大限度地降低成本。为了解决在固有的异构、资源受限和分布式边缘环境中进行 ML 推断所面临的成本和延迟挑战,我们提出了 ProKube,它是一种主动式容器扩展和迁移协调器,可动态调整资源和容器位置,在成本和延迟之间取得合理平衡。ProKube 是与谷歌 Kubernetes 引擎(GKE)联合开发的,可实现跨集群迁移和/或动态扩展。它还支持在调度决策中定期添加最新收集的日志,以处理不可预测的网络行为。在异构边缘设置中进行的实验表明,ProKube 的功效优于其同类产品成本贪婪(CG)、延迟贪婪(LG)和 GeKube(GK)。ProKube 比 CG、LG 和 GK 分别减少了 68%、7% 和 64% 的 SLA 违反率,比 LG 提高了 4.77 个内核的成本,比 CG 和 GK 提高了 3.94 个内核的成本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
International Journal of Network Management
International Journal of Network Management COMPUTER SCIENCE, INFORMATION SYSTEMS-TELECOMMUNICATIONS
CiteScore
5.10
自引率
6.70%
发文量
25
审稿时长
>12 weeks
期刊介绍: Modern computer networks and communication systems are increasing in size, scope, and heterogeneity. The promise of a single end-to-end technology has not been realized and likely never will occur. The decreasing cost of bandwidth is increasing the possible applications of computer networks and communication systems to entirely new domains. Problems in integrating heterogeneous wired and wireless technologies, ensuring security and quality of service, and reliably operating large-scale systems including the inclusion of cloud computing have all emerged as important topics. The one constant is the need for network management. Challenges in network management have never been greater than they are today. The International Journal of Network Management is the forum for researchers, developers, and practitioners in network management to present their work to an international audience. The journal is dedicated to the dissemination of information, which will enable improved management, operation, and maintenance of computer networks and communication systems. The journal is peer reviewed and publishes original papers (both theoretical and experimental) by leading researchers, practitioners, and consultants from universities, research laboratories, and companies around the world. Issues with thematic or guest-edited special topics typically occur several times per year. Topic areas for the journal are largely defined by the taxonomy for network and service management developed by IFIP WG6.6, together with IEEE-CNOM, the IRTF-NMRG and the Emanics Network of Excellence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信