Two-Stage Learning Approach for Semantic-Aware Task Scheduling in Container-Based Clouds

IF 5 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Cloud Computing Pub Date : 2024-12-19 DOI:10.1109/TCC.2024.3520101

Lilu Zhu;Kai Huang;Yanfeng Hu;Yang Wang

{"title":"Two-Stage Learning Approach for Semantic-Aware Task Scheduling in Container-Based Clouds","authors":"Lilu Zhu;Kai Huang;Yanfeng Hu;Yang Wang","doi":"10.1109/TCC.2024.3520101","DOIUrl":null,"url":null,"abstract":"Container-based task scheduling is critical for ensuring a reliable, flexible and cost-effective cloud computing mode. However, in different business cloud systems, state-of-the-art scheduling models are not as effective as those in the simulated world due to the sparsity issues associated with sample sizes and features. Herein, we propose a novel containerized task scheduling framework (SA2CTS) based on reinforcement learning (RL) that incorporates cross-modal contrastive learning (CL) loss. This framework optimizes the scheduler's understanding of the container-based cloud state in RL by adding a pretraining stage, promoting accurate scheduling action inference. Specifically, we design a two-stage learning pipeline. The initial stage involves pretraining the model on a large collection of aligned image-text pairs to extract fine-grained scheduling affinity features, and the high-level semantic representations of scheduling tasks are learned in the multimodal space. In the second stage, we fine-tune the pretrained model with multisource cluster feedback, i.e., build a mapping from state representations to scheduling actions through the RL paradigm, achieving task-oriented and semantic-aware scheduling. The experimental results obtained on three large-scale production cluster datasets substantiate that the proposed SA2CTS method can provide average convergence efficiency and resource utilization improvements of 17.57% and 10.42%, respectively, over the state-of-the-art RL scheduling methods.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"13 1","pages":"148-165"},"PeriodicalIF":5.0000,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cloud Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10810299/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Container-based task scheduling is critical for ensuring a reliable, flexible and cost-effective cloud computing mode. However, in different business cloud systems, state-of-the-art scheduling models are not as effective as those in the simulated world due to the sparsity issues associated with sample sizes and features. Herein, we propose a novel containerized task scheduling framework (SA2CTS) based on reinforcement learning (RL) that incorporates cross-modal contrastive learning (CL) loss. This framework optimizes the scheduler's understanding of the container-based cloud state in RL by adding a pretraining stage, promoting accurate scheduling action inference. Specifically, we design a two-stage learning pipeline. The initial stage involves pretraining the model on a large collection of aligned image-text pairs to extract fine-grained scheduling affinity features, and the high-level semantic representations of scheduling tasks are learned in the multimodal space. In the second stage, we fine-tune the pretrained model with multisource cluster feedback, i.e., build a mapping from state representations to scheduling actions through the RL paradigm, achieving task-oriented and semantic-aware scheduling. The experimental results obtained on three large-scale production cluster datasets substantiate that the proposed SA2CTS method can provide average convergence efficiency and resource utilization improvements of 17.57% and 10.42%, respectively, over the state-of-the-art RL scheduling methods.

查看原文本刊更多论文

基于容器云中语义感知任务调度的两阶段学习方法

基于容器的任务调度对于确保可靠、灵活和经济高效的云计算模式至关重要。然而，在不同的业务云系统中，由于与样本大小和特征相关的稀疏性问题，最先进的调度模型不如模拟世界中的调度模型有效。在此，我们提出了一种新的基于强化学习（RL）的容器化任务调度框架（SA2CTS），该框架结合了跨模态对比学习（CL）损失。该框架通过增加预训练阶段，优化了调度程序对RL中基于容器的云状态的理解，促进了准确的调度动作推断。具体来说，我们设计了一个两阶段的学习管道。初始阶段包括在大量对齐的图像-文本对上对模型进行预训练，以提取细粒度的调度关联特征，并在多模态空间中学习调度任务的高级语义表示。在第二阶段，我们利用多源聚类反馈对预训练模型进行微调，即通过RL范式构建从状态表示到调度动作的映射，实现面向任务和语义感知的调度。在3个大规模生产集群数据集上的实验结果表明，与现有的RL调度方法相比，该方法的平均收敛效率和资源利用率分别提高了17.57%和10.42%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Cloud Computing Computer Science-Software

CiteScore

9.40

自引率

6.20%

发文量

167

期刊介绍： The IEEE Transactions on Cloud Computing (TCC) is dedicated to the multidisciplinary field of cloud computing. It is committed to the publication of articles that present innovative research ideas, application results, and case studies in cloud computing, focusing on key technical issues related to theory, algorithms, systems, applications, and performance.