Adaptive CPU sharing for co-located latency-critical JVM applications and batch jobs under dynamic workloads

IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS
Dishi Xu , Fagui Liu , Bin Wang , Xuhao Tang , Qingbo Wu
{"title":"Adaptive CPU sharing for co-located latency-critical JVM applications and batch jobs under dynamic workloads","authors":"Dishi Xu ,&nbsp;Fagui Liu ,&nbsp;Bin Wang ,&nbsp;Xuhao Tang ,&nbsp;Qingbo Wu","doi":"10.1016/j.future.2026.108387","DOIUrl":null,"url":null,"abstract":"<div><div>Latency-critical (LC) long-running applications operating on Java Virtual Machines (JLRAs) often rely on substantial CPU over-provisioning to meet Service-Level Objectives (SLOs) under dynamic workloads, leading to significant resource underutilization. Additionally, JLRAs exhibit inferior cold-start performance, and frequent deletion and creation of application instances to adjust resource allocation results in performance degradation. Furthermore, harvesting redundant resources by deploying best-effort (BE) batch jobs alongside JLRAs encounters serious challenges due to contention for shared CPU resources. Therefore, we present ChaosRM, a bi-level resource management framework for JVM workload co-location to enhance resource utilization efficiency while eliminating resource contention. In contrast to the conventional approach of isolating JLRAs and batch jobs on non-overlapping CPU sets, ChaosRM proposes a tri-zone CPU isolation mechanism, utilizing two CPU zones to isolate JLRAs and batch jobs, and an shared region for concurrently executing their threads. An application-wide, learning-based Application Manager adjusts the instance states of JLRAs based on the global workload and adaptively learns the shared zone allocation strategy and the performance target represented by thread queuing time; the Node Manager on each server heuristically binds CPU sets to JLRAs and dynamically schedules batch jobs among CPU zones according to this performance target and the JLRA instance states. Experimental results show that, while guaranteeing the SLOs of JLRAs, ChaosRM reduces the completion time of batch jobs by up to 14.10% over the best-performing baseline and up to 54.29% over all baselines.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108387"},"PeriodicalIF":6.2000,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X2600021X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/1/23 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Latency-critical (LC) long-running applications operating on Java Virtual Machines (JLRAs) often rely on substantial CPU over-provisioning to meet Service-Level Objectives (SLOs) under dynamic workloads, leading to significant resource underutilization. Additionally, JLRAs exhibit inferior cold-start performance, and frequent deletion and creation of application instances to adjust resource allocation results in performance degradation. Furthermore, harvesting redundant resources by deploying best-effort (BE) batch jobs alongside JLRAs encounters serious challenges due to contention for shared CPU resources. Therefore, we present ChaosRM, a bi-level resource management framework for JVM workload co-location to enhance resource utilization efficiency while eliminating resource contention. In contrast to the conventional approach of isolating JLRAs and batch jobs on non-overlapping CPU sets, ChaosRM proposes a tri-zone CPU isolation mechanism, utilizing two CPU zones to isolate JLRAs and batch jobs, and an shared region for concurrently executing their threads. An application-wide, learning-based Application Manager adjusts the instance states of JLRAs based on the global workload and adaptively learns the shared zone allocation strategy and the performance target represented by thread queuing time; the Node Manager on each server heuristically binds CPU sets to JLRAs and dynamically schedules batch jobs among CPU zones according to this performance target and the JLRA instance states. Experimental results show that, while guaranteeing the SLOs of JLRAs, ChaosRM reduces the completion time of batch jobs by up to 14.10% over the best-performing baseline and up to 54.29% over all baselines.
动态工作负载下共定位延迟关键型JVM应用程序和批处理作业的自适应CPU共享
在Java虚拟机(jlra)上运行的延迟关键型(LC)长时间运行的应用程序通常依赖大量的CPU过度配置来满足动态工作负载下的服务水平目标(Service-Level Objectives, slo),从而导致严重的资源利用率不足。此外,jlra表现出较差的冷启动性能,频繁删除和创建应用程序实例以调整资源分配会导致性能下降。此外,由于共享CPU资源的争用,通过与jlra一起部署尽力而为(BE)批处理作业来获取冗余资源会遇到严重的挑战。因此,我们提出了ChaosRM,一个用于JVM工作负载托管的双层资源管理框架,以提高资源利用效率,同时消除资源争用。与在非重叠CPU集中隔离jlra和批处理作业的传统方法不同,ChaosRM提出了一种三区CPU隔离机制,利用两个CPU区域隔离jlra和批处理作业,并利用一个共享区域并发执行它们的线程。应用程序范围的、基于学习的应用程序管理器根据全局工作负载调整jlra的实例状态,并自适应地学习共享区域分配策略和线程排队时间表示的性能目标;每个服务器上的节点管理器启发式地将CPU集绑定到JLRA,并根据此性能目标和JLRA实例状态在CPU区域之间动态调度批处理作业。实验结果表明,在保证jlra的SLOs的同时,ChaosRM将批处理作业的完成时间比最佳基准减少了14.10%,比所有基准减少了54.29%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
19.90
自引率
2.70%
发文量
376
审稿时长
10.6 months
期刊介绍: Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书