基于 GNN 和 RL 的异构计算集群资源优化调度模型和算法

The Journal of Supercomputing Pub Date : 2024-08-03 DOI:10.1007/s11227-024-06383-4

Zhen Zhang, Chen Xu, Kun Liu, Shaohua Xu, Long Huang

{"title":"基于 GNN 和 RL 的异构计算集群资源优化调度模型和算法","authors":"Zhen Zhang, Chen Xu, Kun Liu, Shaohua Xu, Long Huang","doi":"10.1007/s11227-024-06383-4","DOIUrl":null,"url":null,"abstract":"<p>In the realm of heterogeneous computing, the efficient allocation of resources is pivotal for optimizing system performance. However, user-submitted tasks are often complex and have varied resource demands. Moreover, the dynamic nature of resource states in such platforms, coupled with variations in resource types and capabilities, results in significant intricacy of the system environment. To this end, we propose a scheduling algorithm based on hierarchical reinforcement learning, namely MD-HRL. Such an algorithm could simultaneously harmonize task completion time, device power consumption, and load balancing. It contains a high-level agent (H-Agent) for task selection and a low-level agent (L-Agent) for resource allocation. The H-Agent leverages multi-hop attention graph neural networks (MAGNA) and one-dimensional convolutional neural networks (1DCNN) to encode the information of tasks and resources. Kolmogorov–Arnold networks is then employed for integrating these representations while calculating subtask priority scores. The L-Agent exploits a double deep Q network to approximate the best strategy and objective function, thereby optimizing the task-to-resource mapping in a dynamic environment. Experimental results demonstrate that MD-HRL outperforms several state of the art baselines. It reduces makespan by 12.54%, improves load balancing by 5.83%, and lowers power consumption by 6.36% on average compared with the suboptimal method.</p>","PeriodicalId":501596,"journal":{"name":"The Journal of Supercomputing","volume":"23 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A resource optimization scheduling model and algorithm for heterogeneous computing clusters based on GNN and RL\",\"authors\":\"Zhen Zhang, Chen Xu, Kun Liu, Shaohua Xu, Long Huang\",\"doi\":\"10.1007/s11227-024-06383-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>In the realm of heterogeneous computing, the efficient allocation of resources is pivotal for optimizing system performance. However, user-submitted tasks are often complex and have varied resource demands. Moreover, the dynamic nature of resource states in such platforms, coupled with variations in resource types and capabilities, results in significant intricacy of the system environment. To this end, we propose a scheduling algorithm based on hierarchical reinforcement learning, namely MD-HRL. Such an algorithm could simultaneously harmonize task completion time, device power consumption, and load balancing. It contains a high-level agent (H-Agent) for task selection and a low-level agent (L-Agent) for resource allocation. The H-Agent leverages multi-hop attention graph neural networks (MAGNA) and one-dimensional convolutional neural networks (1DCNN) to encode the information of tasks and resources. Kolmogorov–Arnold networks is then employed for integrating these representations while calculating subtask priority scores. The L-Agent exploits a double deep Q network to approximate the best strategy and objective function, thereby optimizing the task-to-resource mapping in a dynamic environment. Experimental results demonstrate that MD-HRL outperforms several state of the art baselines. It reduces makespan by 12.54%, improves load balancing by 5.83%, and lowers power consumption by 6.36% on average compared with the suboptimal method.</p>\",\"PeriodicalId\":501596,\"journal\":{\"name\":\"The Journal of Supercomputing\",\"volume\":\"23 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Journal of Supercomputing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s11227-024-06383-4\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11227-024-06383-4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在异构计算领域，有效分配资源是优化系统性能的关键。然而，用户提交的任务往往十分复杂，对资源的需求也各不相同。此外，此类平台中资源状态的动态性质，加上资源类型和能力的变化，导致系统环境错综复杂。为此，我们提出了一种基于分层强化学习的调度算法，即 MD-HRL。这种算法可以同时协调任务完成时间、设备功耗和负载平衡。它包含一个负责任务选择的高级代理（H-Agent）和一个负责资源分配的低级代理（L-Agent）。H 代理利用多跳注意力图神经网络（MAGNA）和一维卷积神经网络（1DCNN）来编码任务和资源信息。然后，在计算子任务优先级分数时，采用科尔莫哥罗夫-阿诺德网络对这些表征进行整合。L-Agent 利用双深度 Q 网络来逼近最佳策略和目标函数，从而优化动态环境中的任务到资源映射。实验结果表明，MD-HRL 的性能优于几种最先进的基线。与次优方法相比，它平均缩短了 12.54%，改善了 5.83% 的负载平衡，降低了 6.36% 的功耗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

A resource optimization scheduling model and algorithm for heterogeneous computing clusters based on GNN and RL

查看原文本刊更多论文

A resource optimization scheduling model and algorithm for heterogeneous computing clusters based on GNN and RL

In the realm of heterogeneous computing, the efficient allocation of resources is pivotal for optimizing system performance. However, user-submitted tasks are often complex and have varied resource demands. Moreover, the dynamic nature of resource states in such platforms, coupled with variations in resource types and capabilities, results in significant intricacy of the system environment. To this end, we propose a scheduling algorithm based on hierarchical reinforcement learning, namely MD-HRL. Such an algorithm could simultaneously harmonize task completion time, device power consumption, and load balancing. It contains a high-level agent (H-Agent) for task selection and a low-level agent (L-Agent) for resource allocation. The H-Agent leverages multi-hop attention graph neural networks (MAGNA) and one-dimensional convolutional neural networks (1DCNN) to encode the information of tasks and resources. Kolmogorov–Arnold networks is then employed for integrating these representations while calculating subtask priority scores. The L-Agent exploits a double deep Q network to approximate the best strategy and objective function, thereby optimizing the task-to-resource mapping in a dynamic environment. Experimental results demonstrate that MD-HRL outperforms several state of the art baselines. It reduces makespan by 12.54%, improves load balancing by 5.83%, and lowers power consumption by 6.36% on average compared with the suboptimal method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The Journal of Supercomputing

自引率

0.00%

发文量