{"title":"Integrating asynchronous advantage actor–critic (A3C) and coalitional game theory algorithms for optimizing energy, carbon emissions, and reliability of scientific workflows in cloud data centers","authors":"Mustafa Ibrahim Khaleel","doi":"10.1016/j.swevo.2024.101756","DOIUrl":null,"url":null,"abstract":"<div><div>The growth of workflow as a service (WFaaS) has become more intricate with the increasing variety and number of workflow module applications and expanding computing resources. This complexity leads to higher energy consumption in data centers, negatively impacting the environment and extending processing times. Striking a balance between reducing energy and carbon emissions and maintaining scheduling reliability is challenging. While deep reinforcement learning (DRL) approaches have shown significant success in workflow scheduling, they require extensive training time and data due to application homogeneity and sparse rewards, and they do not always guarantee effective convergence. On the other hand, experts have developed various scheduling policies that perform well for different optimization goals, but these heuristic strategies lack adaptability to environmental changes and specific workflow optimization. To address these challenges, an enhanced asynchronous advantage actor–critic (A3C) method combined with merge-and-split-based coalitional game theory is proposed. This approach effectively guides DRL learning in large-scale dynamic scheduling issues using optimal policies from the expert pool. The merge-and-split-based method prioritizes computing nodes based on their preemptive characteristics and resource heterogeneity, ensuring reliability-aware workflow scheduling that maps applications to computing resources while considering the dynamic nature of energy costs and carbon footprints. Experiments on real and synthesized workflows show that the proposed algorithm can learn high-quality scheduling policies for various workflows and optimization objectives, achieving energy efficiency improvements of 7.65% to 19.32%, carbon emission reductions of 3.13% to 14.76%, and reliability enhancements of 17.22% to 41.65%.</div></div>","PeriodicalId":48682,"journal":{"name":"Swarm and Evolutionary Computation","volume":"92 ","pages":"Article 101756"},"PeriodicalIF":8.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swarm and Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210650224002943","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Integrating asynchronous advantage actor–critic (A3C) and coalitional game theory algorithms for optimizing energy, carbon emissions, and reliability of scientific workflows in cloud data centers
The growth of workflow as a service (WFaaS) has become more intricate with the increasing variety and number of workflow module applications and expanding computing resources. This complexity leads to higher energy consumption in data centers, negatively impacting the environment and extending processing times. Striking a balance between reducing energy and carbon emissions and maintaining scheduling reliability is challenging. While deep reinforcement learning (DRL) approaches have shown significant success in workflow scheduling, they require extensive training time and data due to application homogeneity and sparse rewards, and they do not always guarantee effective convergence. On the other hand, experts have developed various scheduling policies that perform well for different optimization goals, but these heuristic strategies lack adaptability to environmental changes and specific workflow optimization. To address these challenges, an enhanced asynchronous advantage actor–critic (A3C) method combined with merge-and-split-based coalitional game theory is proposed. This approach effectively guides DRL learning in large-scale dynamic scheduling issues using optimal policies from the expert pool. The merge-and-split-based method prioritizes computing nodes based on their preemptive characteristics and resource heterogeneity, ensuring reliability-aware workflow scheduling that maps applications to computing resources while considering the dynamic nature of energy costs and carbon footprints. Experiments on real and synthesized workflows show that the proposed algorithm can learn high-quality scheduling policies for various workflows and optimization objectives, achieving energy efficiency improvements of 7.65% to 19.32%, carbon emission reductions of 3.13% to 14.76%, and reliability enhancements of 17.22% to 41.65%.
期刊介绍:
Swarm and Evolutionary Computation is a pioneering peer-reviewed journal focused on the latest research and advancements in nature-inspired intelligent computation using swarm and evolutionary algorithms. It covers theoretical, experimental, and practical aspects of these paradigms and their hybrids, promoting interdisciplinary research. The journal prioritizes the publication of high-quality, original articles that push the boundaries of evolutionary computation and swarm intelligence. Additionally, it welcomes survey papers on current topics and novel applications. Topics of interest include but are not limited to: Genetic Algorithms, and Genetic Programming, Evolution Strategies, and Evolutionary Programming, Differential Evolution, Artificial Immune Systems, Particle Swarms, Ant Colony, Bacterial Foraging, Artificial Bees, Fireflies Algorithm, Harmony Search, Artificial Life, Digital Organisms, Estimation of Distribution Algorithms, Stochastic Diffusion Search, Quantum Computing, Nano Computing, Membrane Computing, Human-centric Computing, Hybridization of Algorithms, Memetic Computing, Autonomic Computing, Self-organizing systems, Combinatorial, Discrete, Binary, Constrained, Multi-objective, Multi-modal, Dynamic, and Large-scale Optimization.