Reward shaping in DRL: A novel framework for adaptive resource management in dynamic environments

IF 8.1 1区计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Sciences Pub Date : 2025-04-24 DOI:10.1016/j.ins.2025.122238

Mario Chahoud , Hani Sami , Rabeb Mizouni , Jamal Bentahar , Azzam Mourad , Hadi Otrok , Chamseddine Talhi

{"title":"Reward shaping in DRL: A novel framework for adaptive resource management in dynamic environments","authors":"Mario Chahoud , Hani Sami , Rabeb Mizouni , Jamal Bentahar , Azzam Mourad , Hadi Otrok , Chamseddine Talhi","doi":"10.1016/j.ins.2025.122238","DOIUrl":null,"url":null,"abstract":"<div><div>In edge computing environments, efficient computation resource management is crucial for optimizing service allocation to hosts in the form of containers. These environments experience dynamic user demands and high mobility, making traditional static and heuristic-based methods inadequate for handling such complexity and variability. Deep Reinforcement Learning (DRL) offers a more adaptable solution, capable of responding to these dynamic conditions. However, existing DRL methods face challenges such as high reward variability, slow convergence, and difficulties in incorporating user mobility and rapidly changing environmental configurations. To overcome these challenges, we propose a novel DRL framework for computation resource optimization at the edge layer. This framework leverages a customized Markov Decision Process (MDP) and Proximal Policy Optimization (PPO), integrating a Graph Convolutional Transformer (GCT). By combining Graph Convolutional Networks (GCN) with Transformer encoders, the GCT introduces a spatio-temporal reward-shaping mechanism that enhances the agent's ability to select hosts and assign services efficiently in real time while minimizing the overload. Our approach significantly enhances the speed and accuracy of resource allocation, achieving, on average across two datasets, a 30% reduction in convergence time, a 25% increase in total accumulated rewards, and a 35% improvement in service allocation efficiency compared to standard DRL methods and existing reward-shaping techniques. Our method was validated using two real-world datasets, MOBILE DATA CHALLENGE (MDC) and Shanghai Telecom, and was compared against standard DRL models, reward-shaping baselines, and heuristic methods.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"715 ","pages":"Article 122238"},"PeriodicalIF":8.1000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525003706","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In edge computing environments, efficient computation resource management is crucial for optimizing service allocation to hosts in the form of containers. These environments experience dynamic user demands and high mobility, making traditional static and heuristic-based methods inadequate for handling such complexity and variability. Deep Reinforcement Learning (DRL) offers a more adaptable solution, capable of responding to these dynamic conditions. However, existing DRL methods face challenges such as high reward variability, slow convergence, and difficulties in incorporating user mobility and rapidly changing environmental configurations. To overcome these challenges, we propose a novel DRL framework for computation resource optimization at the edge layer. This framework leverages a customized Markov Decision Process (MDP) and Proximal Policy Optimization (PPO), integrating a Graph Convolutional Transformer (GCT). By combining Graph Convolutional Networks (GCN) with Transformer encoders, the GCT introduces a spatio-temporal reward-shaping mechanism that enhances the agent's ability to select hosts and assign services efficiently in real time while minimizing the overload. Our approach significantly enhances the speed and accuracy of resource allocation, achieving, on average across two datasets, a 30% reduction in convergence time, a 25% increase in total accumulated rewards, and a 35% improvement in service allocation efficiency compared to standard DRL methods and existing reward-shaping techniques. Our method was validated using two real-world datasets, MOBILE DATA CHALLENGE (MDC) and Shanghai Telecom, and was compared against standard DRL models, reward-shaping baselines, and heuristic methods.

查看原文本刊更多论文

DRL中的奖励形成：动态环境下自适应资源管理的新框架

在边缘计算环境中，高效的计算资源管理对于优化以容器形式向主机分配服务至关重要。这些环境经历了动态的用户需求和高度的移动性，使得传统的静态和基于启发式的方法不足以处理这种复杂性和可变性。深度强化学习（DRL）提供了一种更具适应性的解决方案，能够响应这些动态条件。然而，现有的DRL方法面临着诸如高回报可变性、缓慢收敛以及难以结合用户移动性和快速变化的环境配置等挑战。为了克服这些挑战，我们提出了一种新的DRL框架，用于边缘层的计算资源优化。该框架利用定制的马尔可夫决策过程（MDP）和近端策略优化（PPO），集成了一个图卷积转换器（GCT）。通过将图卷积网络（GCN）与变压器编码器相结合，GCT引入了一种时空奖励塑造机制，增强了智能体实时高效地选择主机和分配服务的能力，同时最大限度地减少了过载。我们的方法显著提高了资源分配的速度和准确性，与标准DRL方法和现有的奖励塑造技术相比，在两个数据集上平均实现了收敛时间减少30%，总累积奖励增加25%，服务分配效率提高35%。我们的方法通过移动数据挑战（MDC）和上海电信两个真实世界的数据集进行验证，并与标准DRL模型、奖励塑造基线和启发式方法进行比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Sciences 工程技术-计算机：信息系统

CiteScore

14.00

自引率

17.30%

发文量

1322

审稿时长

10.4 months

期刊介绍： Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.