Mario Chahoud , Hani Sami , Rabeb Mizouni , Jamal Bentahar , Azzam Mourad , Hadi Otrok , Chamseddine Talhi
{"title":"Reward shaping in DRL: A novel framework for adaptive resource management in dynamic environments","authors":"Mario Chahoud , Hani Sami , Rabeb Mizouni , Jamal Bentahar , Azzam Mourad , Hadi Otrok , Chamseddine Talhi","doi":"10.1016/j.ins.2025.122238","DOIUrl":null,"url":null,"abstract":"<div><div>In edge computing environments, efficient computation resource management is crucial for optimizing service allocation to hosts in the form of containers. These environments experience dynamic user demands and high mobility, making traditional static and heuristic-based methods inadequate for handling such complexity and variability. Deep Reinforcement Learning (DRL) offers a more adaptable solution, capable of responding to these dynamic conditions. However, existing DRL methods face challenges such as high reward variability, slow convergence, and difficulties in incorporating user mobility and rapidly changing environmental configurations. To overcome these challenges, we propose a novel DRL framework for computation resource optimization at the edge layer. This framework leverages a customized Markov Decision Process (MDP) and Proximal Policy Optimization (PPO), integrating a Graph Convolutional Transformer (GCT). By combining Graph Convolutional Networks (GCN) with Transformer encoders, the GCT introduces a spatio-temporal reward-shaping mechanism that enhances the agent's ability to select hosts and assign services efficiently in real time while minimizing the overload. Our approach significantly enhances the speed and accuracy of resource allocation, achieving, on average across two datasets, a 30% reduction in convergence time, a 25% increase in total accumulated rewards, and a 35% improvement in service allocation efficiency compared to standard DRL methods and existing reward-shaping techniques. Our method was validated using two real-world datasets, MOBILE DATA CHALLENGE (MDC) and Shanghai Telecom, and was compared against standard DRL models, reward-shaping baselines, and heuristic methods.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"715 ","pages":"Article 122238"},"PeriodicalIF":8.1000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525003706","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In edge computing environments, efficient computation resource management is crucial for optimizing service allocation to hosts in the form of containers. These environments experience dynamic user demands and high mobility, making traditional static and heuristic-based methods inadequate for handling such complexity and variability. Deep Reinforcement Learning (DRL) offers a more adaptable solution, capable of responding to these dynamic conditions. However, existing DRL methods face challenges such as high reward variability, slow convergence, and difficulties in incorporating user mobility and rapidly changing environmental configurations. To overcome these challenges, we propose a novel DRL framework for computation resource optimization at the edge layer. This framework leverages a customized Markov Decision Process (MDP) and Proximal Policy Optimization (PPO), integrating a Graph Convolutional Transformer (GCT). By combining Graph Convolutional Networks (GCN) with Transformer encoders, the GCT introduces a spatio-temporal reward-shaping mechanism that enhances the agent's ability to select hosts and assign services efficiently in real time while minimizing the overload. Our approach significantly enhances the speed and accuracy of resource allocation, achieving, on average across two datasets, a 30% reduction in convergence time, a 25% increase in total accumulated rewards, and a 35% improvement in service allocation efficiency compared to standard DRL methods and existing reward-shaping techniques. Our method was validated using two real-world datasets, MOBILE DATA CHALLENGE (MDC) and Shanghai Telecom, and was compared against standard DRL models, reward-shaping baselines, and heuristic methods.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.