A resource optimization scheduling model and algorithm for heterogeneous computing clusters based on GNN and RL

Zhen Zhang, Chen Xu, Kun Liu, Shaohua Xu, Long Huang
{"title":"A resource optimization scheduling model and algorithm for heterogeneous computing clusters based on GNN and RL","authors":"Zhen Zhang, Chen Xu, Kun Liu, Shaohua Xu, Long Huang","doi":"10.1007/s11227-024-06383-4","DOIUrl":null,"url":null,"abstract":"<p>In the realm of heterogeneous computing, the efficient allocation of resources is pivotal for optimizing system performance. However, user-submitted tasks are often complex and have varied resource demands. Moreover, the dynamic nature of resource states in such platforms, coupled with variations in resource types and capabilities, results in significant intricacy of the system environment. To this end, we propose a scheduling algorithm based on hierarchical reinforcement learning, namely MD-HRL. Such an algorithm could simultaneously harmonize task completion time, device power consumption, and load balancing. It contains a high-level agent (H-Agent) for task selection and a low-level agent (L-Agent) for resource allocation. The H-Agent leverages multi-hop attention graph neural networks (MAGNA) and one-dimensional convolutional neural networks (1DCNN) to encode the information of tasks and resources. Kolmogorov–Arnold networks is then employed for integrating these representations while calculating subtask priority scores. The L-Agent exploits a double deep Q network to approximate the best strategy and objective function, thereby optimizing the task-to-resource mapping in a dynamic environment. Experimental results demonstrate that MD-HRL outperforms several state of the art baselines. It reduces makespan by 12.54%, improves load balancing by 5.83%, and lowers power consumption by 6.36% on average compared with the suboptimal method.</p>","PeriodicalId":501596,"journal":{"name":"The Journal of Supercomputing","volume":"23 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11227-024-06383-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In the realm of heterogeneous computing, the efficient allocation of resources is pivotal for optimizing system performance. However, user-submitted tasks are often complex and have varied resource demands. Moreover, the dynamic nature of resource states in such platforms, coupled with variations in resource types and capabilities, results in significant intricacy of the system environment. To this end, we propose a scheduling algorithm based on hierarchical reinforcement learning, namely MD-HRL. Such an algorithm could simultaneously harmonize task completion time, device power consumption, and load balancing. It contains a high-level agent (H-Agent) for task selection and a low-level agent (L-Agent) for resource allocation. The H-Agent leverages multi-hop attention graph neural networks (MAGNA) and one-dimensional convolutional neural networks (1DCNN) to encode the information of tasks and resources. Kolmogorov–Arnold networks is then employed for integrating these representations while calculating subtask priority scores. The L-Agent exploits a double deep Q network to approximate the best strategy and objective function, thereby optimizing the task-to-resource mapping in a dynamic environment. Experimental results demonstrate that MD-HRL outperforms several state of the art baselines. It reduces makespan by 12.54%, improves load balancing by 5.83%, and lowers power consumption by 6.36% on average compared with the suboptimal method.

Abstract Image

基于 GNN 和 RL 的异构计算集群资源优化调度模型和算法
在异构计算领域,有效分配资源是优化系统性能的关键。然而,用户提交的任务往往十分复杂,对资源的需求也各不相同。此外,此类平台中资源状态的动态性质,加上资源类型和能力的变化,导致系统环境错综复杂。为此,我们提出了一种基于分层强化学习的调度算法,即 MD-HRL。这种算法可以同时协调任务完成时间、设备功耗和负载平衡。它包含一个负责任务选择的高级代理(H-Agent)和一个负责资源分配的低级代理(L-Agent)。H 代理利用多跳注意力图神经网络(MAGNA)和一维卷积神经网络(1DCNN)来编码任务和资源信息。然后,在计算子任务优先级分数时,采用科尔莫哥罗夫-阿诺德网络对这些表征进行整合。L-Agent 利用双深度 Q 网络来逼近最佳策略和目标函数,从而优化动态环境中的任务到资源映射。实验结果表明,MD-HRL 的性能优于几种最先进的基线。与次优方法相比,它平均缩短了 12.54%,改善了 5.83% 的负载平衡,降低了 6.36% 的功耗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信