Multi-agent reinforcement learning for network routing in integrated access backhaul networks

IF 4.8 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Ad Hoc Networks Pub Date : 2023-11-08 DOI:10.1016/j.adhoc.2023.103347

Shahaf Yamin, Haim H. Permuter

{"title":"Multi-agent reinforcement learning for network routing in integrated access backhaul networks","authors":"Shahaf Yamin, Haim H. Permuter","doi":"10.1016/j.adhoc.2023.103347","DOIUrl":null,"url":null,"abstract":"<div><p>In this study, we examine the problem of downlink wireless routing in integrated access backhaul (IAB) networks involving fiber-connected base stations, wireless base stations, and multiple users. Physical constraints prevent the use of a central controller, leaving base stations with limited access to real-time network conditions. These networks operate in a time-slotted regime, where base stations monitor network conditions and forward packets accordingly. Our objective is to maximize the arrival ratio of packets, while simultaneously minimizing their latency. To accomplish this, we formulate this problem as a multi-agent partially observed Markov Decision Process (POMDP). Moreover, we develop an algorithm that uses Multi-Agent Reinforcement Learning (MARL) combined with Advantage Actor Critic (A2C) to derive a joint routing policy on a distributed basis. Due to the importance of packet destinations for successful routing decisions, we utilize information about similar destinations as a basis for selecting specific-destination routing decisions. For portraying the similarity between those destinations, we rely on their relational base-station associations, i.e., which base station they are currently connected to. Therefore, the algorithm is referred to as Relational Advantage Actor Critic (Relational A2C). To the best of our knowledge, this is the first work that optimizes routing strategy for IAB networks. Further, we present three types of training paradigms for this algorithm in order to provide flexibility in terms of its performance and throughput. Through numerical experiments with different network scenarios, Relational A2C algorithms were demonstrated to be capable of achieving near-centralized performance even though they operate in a decentralized manner in the network of interest. Based on the results of those experiments, we compare Relational A2C to other reinforcement learning algorithms, like Q-Routing and Hybrid Routing. This comparison illustrates that solving the joint optimization problem increases network efficiency and reduces selfish agent behavior.</p></div>","PeriodicalId":55555,"journal":{"name":"Ad Hoc Networks","volume":"153 ","pages":"Article 103347"},"PeriodicalIF":4.8000,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ad Hoc Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1570870523002676","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In this study, we examine the problem of downlink wireless routing in integrated access backhaul (IAB) networks involving fiber-connected base stations, wireless base stations, and multiple users. Physical constraints prevent the use of a central controller, leaving base stations with limited access to real-time network conditions. These networks operate in a time-slotted regime, where base stations monitor network conditions and forward packets accordingly. Our objective is to maximize the arrival ratio of packets, while simultaneously minimizing their latency. To accomplish this, we formulate this problem as a multi-agent partially observed Markov Decision Process (POMDP). Moreover, we develop an algorithm that uses Multi-Agent Reinforcement Learning (MARL) combined with Advantage Actor Critic (A2C) to derive a joint routing policy on a distributed basis. Due to the importance of packet destinations for successful routing decisions, we utilize information about similar destinations as a basis for selecting specific-destination routing decisions. For portraying the similarity between those destinations, we rely on their relational base-station associations, i.e., which base station they are currently connected to. Therefore, the algorithm is referred to as Relational Advantage Actor Critic (Relational A2C). To the best of our knowledge, this is the first work that optimizes routing strategy for IAB networks. Further, we present three types of training paradigms for this algorithm in order to provide flexibility in terms of its performance and throughput. Through numerical experiments with different network scenarios, Relational A2C algorithms were demonstrated to be capable of achieving near-centralized performance even though they operate in a decentralized manner in the network of interest. Based on the results of those experiments, we compare Relational A2C to other reinforcement learning algorithms, like Q-Routing and Hybrid Routing. This comparison illustrates that solving the joint optimization problem increases network efficiency and reduces selfish agent behavior.

查看原文本刊更多论文

综合接入回程网络中网络路由的多智能体强化学习

在这项研究中，我们研究了综合接入回程(IAB)网络中涉及光纤连接基站、无线基站和多用户的下行无线路由问题。物理限制阻止了中央控制器的使用，使基站对实时网络条件的访问受到限制。这些网络在时隙制度下运行，其中基站监控网络状况并相应地转发数据包。我们的目标是最大化数据包的到达率，同时最小化它们的延迟。为了实现这一点，我们将该问题表述为多智能体部分观察马尔可夫决策过程(POMDP)。此外，我们开发了一种算法，该算法使用多智能体强化学习(MARL)结合优势行为者批评(A2C)在分布式基础上推导出联合路由策略。由于数据包目的地对于成功的路由决策的重要性，我们利用关于相似目的地的信息作为选择特定目的地路由决策的基础。为了描绘这些目的地之间的相似性，我们依赖于它们的关系基站关联，即它们当前连接到哪个基站。因此，该算法被称为关系优势行动者批评(Relational A2C)。据我们所知，这是第一个优化IAB网络路由策略的工作。此外，为了在性能和吞吐量方面提供灵活性，我们为该算法提供了三种类型的训练范式。通过不同网络场景的数值实验，证明关系A2C算法能够实现接近集中的性能，即使它们在感兴趣的网络中以分散的方式运行。基于这些实验的结果，我们将关系A2C与其他强化学习算法(如Q-Routing和Hybrid Routing)进行了比较。这一对比表明，解决联合优化问题提高了网络效率，减少了个体的自私行为。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Ad Hoc Networks 工程技术-电信学

CiteScore

10.20

自引率

4.20%

发文量

131

审稿时长

4.8 months

期刊介绍： The Ad Hoc Networks is an international and archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in ad hoc and sensor networking areas. The Ad Hoc Networks considers original, high quality and unpublished contributions addressing all aspects of ad hoc and sensor networks. Specific areas of interest include, but are not limited to: Mobile and Wireless Ad Hoc Networks Sensor Networks Wireless Local and Personal Area Networks Home Networks Ad Hoc Networks of Autonomous Intelligent Systems Novel Architectures for Ad Hoc and Sensor Networks Self-organizing Network Architectures and Protocols Transport Layer Protocols Routing protocols (unicast, multicast, geocast, etc.) Media Access Control Techniques Error Control Schemes Power-Aware, Low-Power and Energy-Efficient Designs Synchronization and Scheduling Issues Mobility Management Mobility-Tolerant Communication Protocols Location Tracking and Location-based Services Resource and Information Management Security and Fault-Tolerance Issues Hardware and Software Platforms, Systems, and Testbeds Experimental and Prototype Results Quality-of-Service Issues Cross-Layer Interactions Scalability Issues Performance Analysis and Simulation of Protocols.