A GAN enhanced meta-deep reinforcement learning approach for DCN routing optimization

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2025-04-04 DOI:10.1016/j.inffus.2025.103160

Qing Guo , Wei Zhao , Zhuoheng Lyu , Tingting Zhao

{"title":"A GAN enhanced meta-deep reinforcement learning approach for DCN routing optimization","authors":"Qing Guo , Wei Zhao , Zhuoheng Lyu , Tingting Zhao","doi":"10.1016/j.inffus.2025.103160","DOIUrl":null,"url":null,"abstract":"<div><div>In large-scale data center networks (DCN), dynamic changes in traffic and topology lead to traffic patterns following a long-tail distribution (meaning a small number of traffic samples appear frequently, while most appear infrequently). This results in clear Non-IID (non-independent and identically distributed) characteristics, posing a serious challenge for traffic routing optimization. Traditional deep reinforcement learning (DRL) methods often assume relatively stable data distributions, while approaches that combine DRL with meta-learning assume only small shifts in traffic or the environment. However, such methods struggle to cope with the scarcity of samples and cross-scenario feature-space shifts caused by long-tail distributions. To address this issue, this paper proposes a global intelligent routing optimization scheme based on meta-reinforcement learning (Meta-DRL) enhanced by generative adversarial networks (GAN). First, a two-level nested Meta-DRL model is built. The lower level focuses on specific task policy optimization, while the upper level learns generalized global network parameters, improving the model’s initialization quality and generalization in unfamiliar network settings. Next, we introduce an innovative mechanism that combines GAN-based feature encoding with a meta-learning discriminator, refining the GAN’s feature discrimination boundary and greatly enhancing the quality of synthesized samples, especially where data is scarce. Furthermore, a parameter feature space mapping mechanism is proposed to unify features from new and old tasks into a shared representation space, avoiding repeated Meta-DRL training when the network environment changes. This substantially boosts the model’s generalization and decision-making efficiency. Simulation results show that, in terms of convergence speed, accumulated rewards, and network routing performance, the GAN-enhanced Meta-DRL method significantly outperforms traditional DRL approaches and other meta-learning methods.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"121 ","pages":"Article 103160"},"PeriodicalIF":14.7000,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525002337","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In large-scale data center networks (DCN), dynamic changes in traffic and topology lead to traffic patterns following a long-tail distribution (meaning a small number of traffic samples appear frequently, while most appear infrequently). This results in clear Non-IID (non-independent and identically distributed) characteristics, posing a serious challenge for traffic routing optimization. Traditional deep reinforcement learning (DRL) methods often assume relatively stable data distributions, while approaches that combine DRL with meta-learning assume only small shifts in traffic or the environment. However, such methods struggle to cope with the scarcity of samples and cross-scenario feature-space shifts caused by long-tail distributions. To address this issue, this paper proposes a global intelligent routing optimization scheme based on meta-reinforcement learning (Meta-DRL) enhanced by generative adversarial networks (GAN). First, a two-level nested Meta-DRL model is built. The lower level focuses on specific task policy optimization, while the upper level learns generalized global network parameters, improving the model’s initialization quality and generalization in unfamiliar network settings. Next, we introduce an innovative mechanism that combines GAN-based feature encoding with a meta-learning discriminator, refining the GAN’s feature discrimination boundary and greatly enhancing the quality of synthesized samples, especially where data is scarce. Furthermore, a parameter feature space mapping mechanism is proposed to unify features from new and old tasks into a shared representation space, avoiding repeated Meta-DRL training when the network environment changes. This substantially boosts the model’s generalization and decision-making efficiency. Simulation results show that, in terms of convergence speed, accumulated rewards, and network routing performance, the GAN-enhanced Meta-DRL method significantly outperforms traditional DRL approaches and other meta-learning methods.

查看原文本刊更多论文

一种GAN增强元深度强化学习方法用于DCN路由优化

在大规模数据中心网络（DCN）中，由于流量和拓扑结构的动态变化，导致流量模式遵循长尾分布（即少量流量样本频繁出现，而大多数流量样本不频繁出现）。这导致了明显的非iid（非独立和同分布）特征，对流量路由优化提出了严峻的挑战。传统的深度强化学习（DRL）方法通常假设相对稳定的数据分布，而将DRL与元学习相结合的方法只假设流量或环境的微小变化。然而，这些方法难以应对样本的稀缺性和长尾分布引起的跨场景特征空间变化。为了解决这一问题，本文提出了一种基于生成对抗网络（GAN）增强的元强化学习（Meta-DRL）的全局智能路由优化方案。首先，构建了两层嵌套的Meta-DRL模型。底层专注于特定任务策略的优化，上层学习广义的全局网络参数，提高模型的初始化质量和在陌生网络环境下的泛化能力。接下来，我们引入了一种创新机制，将基于GAN的特征编码与元学习判别器相结合，改进了GAN的特征判别边界，大大提高了合成样本的质量，特别是在数据稀缺的情况下。此外，提出了一种参数特征空间映射机制，将新旧任务的特征统一到一个共享的表示空间中，避免了网络环境变化时元drl的重复训练。这大大提高了模型的泛化和决策效率。仿真结果表明，在收敛速度、累积奖励和网络路由性能方面，gan增强的Meta-DRL方法显著优于传统的DRL方法和其他元学习方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.