集成异步优势参与者批评（A3C）和联合博弈论算法，优化云数据中心的能源、碳排放和科学工作流程的可靠性

IF 8.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Swarm and Evolutionary Computation Pub Date : 2025-02-01 DOI:10.1016/j.swevo.2024.101756

Mustafa Ibrahim Khaleel

{"title":"集成异步优势参与者批评（A3C）和联合博弈论算法，优化云数据中心的能源、碳排放和科学工作流程的可靠性","authors":"Mustafa Ibrahim Khaleel","doi":"10.1016/j.swevo.2024.101756","DOIUrl":null,"url":null,"abstract":"<div><div>The growth of workflow as a service (WFaaS) has become more intricate with the increasing variety and number of workflow module applications and expanding computing resources. This complexity leads to higher energy consumption in data centers, negatively impacting the environment and extending processing times. Striking a balance between reducing energy and carbon emissions and maintaining scheduling reliability is challenging. While deep reinforcement learning (DRL) approaches have shown significant success in workflow scheduling, they require extensive training time and data due to application homogeneity and sparse rewards, and they do not always guarantee effective convergence. On the other hand, experts have developed various scheduling policies that perform well for different optimization goals, but these heuristic strategies lack adaptability to environmental changes and specific workflow optimization. To address these challenges, an enhanced asynchronous advantage actor–critic (A3C) method combined with merge-and-split-based coalitional game theory is proposed. This approach effectively guides DRL learning in large-scale dynamic scheduling issues using optimal policies from the expert pool. The merge-and-split-based method prioritizes computing nodes based on their preemptive characteristics and resource heterogeneity, ensuring reliability-aware workflow scheduling that maps applications to computing resources while considering the dynamic nature of energy costs and carbon footprints. Experiments on real and synthesized workflows show that the proposed algorithm can learn high-quality scheduling policies for various workflows and optimization objectives, achieving energy efficiency improvements of 7.65% to 19.32%, carbon emission reductions of 3.13% to 14.76%, and reliability enhancements of 17.22% to 41.65%.</div></div>","PeriodicalId":48682,"journal":{"name":"Swarm and Evolutionary Computation","volume":"92 ","pages":"Article 101756"},"PeriodicalIF":8.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Integrating asynchronous advantage actor–critic (A3C) and coalitional game theory algorithms for optimizing energy, carbon emissions, and reliability of scientific workflows in cloud data centers\",\"authors\":\"Mustafa Ibrahim Khaleel\",\"doi\":\"10.1016/j.swevo.2024.101756\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The growth of workflow as a service (WFaaS) has become more intricate with the increasing variety and number of workflow module applications and expanding computing resources. This complexity leads to higher energy consumption in data centers, negatively impacting the environment and extending processing times. Striking a balance between reducing energy and carbon emissions and maintaining scheduling reliability is challenging. While deep reinforcement learning (DRL) approaches have shown significant success in workflow scheduling, they require extensive training time and data due to application homogeneity and sparse rewards, and they do not always guarantee effective convergence. On the other hand, experts have developed various scheduling policies that perform well for different optimization goals, but these heuristic strategies lack adaptability to environmental changes and specific workflow optimization. To address these challenges, an enhanced asynchronous advantage actor–critic (A3C) method combined with merge-and-split-based coalitional game theory is proposed. This approach effectively guides DRL learning in large-scale dynamic scheduling issues using optimal policies from the expert pool. The merge-and-split-based method prioritizes computing nodes based on their preemptive characteristics and resource heterogeneity, ensuring reliability-aware workflow scheduling that maps applications to computing resources while considering the dynamic nature of energy costs and carbon footprints. Experiments on real and synthesized workflows show that the proposed algorithm can learn high-quality scheduling policies for various workflows and optimization objectives, achieving energy efficiency improvements of 7.65% to 19.32%, carbon emission reductions of 3.13% to 14.76%, and reliability enhancements of 17.22% to 41.65%.</div></div>\",\"PeriodicalId\":48682,\"journal\":{\"name\":\"Swarm and Evolutionary Computation\",\"volume\":\"92 \",\"pages\":\"Article 101756\"},\"PeriodicalIF\":8.2000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Swarm and Evolutionary Computation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2210650224002943\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swarm and Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210650224002943","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

随着工作流模块应用的种类和数量不断增加以及计算资源的不断扩大，工作流即服务（WFaaS）的发展变得更加复杂。这种复杂性导致数据中心能耗增加，对环境造成负面影响，并延长了处理时间。如何在减少能源和碳排放与保持调度可靠性之间取得平衡是一项挑战。虽然深度强化学习（DRL）方法在工作流调度方面取得了显著的成功，但由于应用的同质性和奖励的稀疏性，它们需要大量的训练时间和数据，而且并不总能保证有效收敛。另一方面，专家们开发了各种针对不同优化目标的调度策略，但这些启发式策略缺乏对环境变化和特定工作流优化的适应性。为了应对这些挑战，我们提出了一种结合基于合并与拆分的联盟博弈论的增强型异步优势行动者批判（A3C）方法。该方法利用专家库中的最优策略，在大规模动态调度问题中有效指导 DRL 学习。基于合并与拆分的方法根据计算节点的抢占特性和资源异构性确定其优先级，确保了工作流调度的可靠性感知，从而将应用映射到计算资源，同时考虑到能源成本和碳足迹的动态性质。在真实工作流和合成工作流上的实验表明，所提出的算法可以为各种工作流和优化目标学习高质量的调度策略，实现 7.65% 到 19.32% 的能效改进、3.13% 到 14.76% 的碳排放减少以及 17.22% 到 41.65% 的可靠性增强。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Integrating asynchronous advantage actor–critic (A3C) and coalitional game theory algorithms for optimizing energy, carbon emissions, and reliability of scientific workflows in cloud data centers

The growth of workflow as a service (WFaaS) has become more intricate with the increasing variety and number of workflow module applications and expanding computing resources. This complexity leads to higher energy consumption in data centers, negatively impacting the environment and extending processing times. Striking a balance between reducing energy and carbon emissions and maintaining scheduling reliability is challenging. While deep reinforcement learning (DRL) approaches have shown significant success in workflow scheduling, they require extensive training time and data due to application homogeneity and sparse rewards, and they do not always guarantee effective convergence. On the other hand, experts have developed various scheduling policies that perform well for different optimization goals, but these heuristic strategies lack adaptability to environmental changes and specific workflow optimization. To address these challenges, an enhanced asynchronous advantage actor–critic (A3C) method combined with merge-and-split-based coalitional game theory is proposed. This approach effectively guides DRL learning in large-scale dynamic scheduling issues using optimal policies from the expert pool. The merge-and-split-based method prioritizes computing nodes based on their preemptive characteristics and resource heterogeneity, ensuring reliability-aware workflow scheduling that maps applications to computing resources while considering the dynamic nature of energy costs and carbon footprints. Experiments on real and synthesized workflows show that the proposed algorithm can learn high-quality scheduling policies for various workflows and optimization objectives, achieving energy efficiency improvements of 7.65% to 19.32%, carbon emission reductions of 3.13% to 14.76%, and reliability enhancements of 17.22% to 41.65%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Swarm and Evolutionary Computation COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, THEORY & METHODS

CiteScore

16.00

自引率

12.00%

发文量

169

期刊介绍： Swarm and Evolutionary Computation is a pioneering peer-reviewed journal focused on the latest research and advancements in nature-inspired intelligent computation using swarm and evolutionary algorithms. It covers theoretical, experimental, and practical aspects of these paradigms and their hybrids, promoting interdisciplinary research. The journal prioritizes the publication of high-quality, original articles that push the boundaries of evolutionary computation and swarm intelligence. Additionally, it welcomes survey papers on current topics and novel applications. Topics of interest include but are not limited to: Genetic Algorithms, and Genetic Programming, Evolution Strategies, and Evolutionary Programming, Differential Evolution, Artificial Immune Systems, Particle Swarms, Ant Colony, Bacterial Foraging, Artificial Bees, Fireflies Algorithm, Harmony Search, Artificial Life, Digital Organisms, Estimation of Distribution Algorithms, Stochastic Diffusion Search, Quantum Computing, Nano Computing, Membrane Computing, Human-centric Computing, Hybridization of Algorithms, Memetic Computing, Autonomic Computing, Self-organizing systems, Combinatorial, Discrete, Binary, Constrained, Multi-objective, Multi-modal, Dynamic, and Large-scale Optimization.