Interference Coordination for Autonomous HetNets Based on Adversarial Learning

2021 13th International Conference on Communication Software and Networks (ICCSN) Pub Date : 2021-06-04 DOI:10.1109/ICCSN52437.2021.9463652

Mu Yan, Jian Yang

{"title":"Interference Coordination for Autonomous HetNets Based on Adversarial Learning","authors":"Mu Yan, Jian Yang","doi":"10.1109/ICCSN52437.2021.9463652","DOIUrl":null,"url":null,"abstract":"This paper proposes an intelligent inter-cell interference coordination (ICIC) scheme for autonomous heterogeneous networks (HetNets), where the SBSs agilely schedule sub-channels to individual users at each Transmit Time Interval (TTI) with aim of mitigating interferences and maximizing long-term throughput by sensing the environment. As only local network states including the Signal to Interference plus Noise Ratio (SINR) can be observed in the autonomous HetNets, the decision-making process of the interference coordination at SBSs is modeled as a non-cooperative partially observable Markov decision process (POMDP) game, with aim to achieve Nash Equilibrium. Since the reward function is inexplicit and only few samples can be used for prior-training, we formulate the ICIC problem as a distributed inverse reinforcement learning (IRL) problem following the POMDP games. Furthermore, we propose a non-prior knowledge based self-imitating learning (SIL) algorithm which incorporates Wasserstein Generative Adversarial Networks (WGANs) and Double Deep Q Network (Double DQN) algorithms for performing behavior imitation and few-shot learning in solving the IRL problem from both the policy and value. In order to cater for the plug-and-play operation mode of indoor SBSs, the Double DQN is initialized according to the SINR, and a nested training scheme is adopted to overcome the slow-start problem of the learning process. Numerical results reveal that SIL is able to implement TTI level’s decision-making to solve the ICIC problem, and the overall network throughput of SIL can be improved by up to 19.8% when compared with other known benchmark algorithms.","PeriodicalId":263568,"journal":{"name":"2021 13th International Conference on Communication Software and Networks (ICCSN)","volume":"2003 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 13th International Conference on Communication Software and Networks (ICCSN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSN52437.2021.9463652","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

This paper proposes an intelligent inter-cell interference coordination (ICIC) scheme for autonomous heterogeneous networks (HetNets), where the SBSs agilely schedule sub-channels to individual users at each Transmit Time Interval (TTI) with aim of mitigating interferences and maximizing long-term throughput by sensing the environment. As only local network states including the Signal to Interference plus Noise Ratio (SINR) can be observed in the autonomous HetNets, the decision-making process of the interference coordination at SBSs is modeled as a non-cooperative partially observable Markov decision process (POMDP) game, with aim to achieve Nash Equilibrium. Since the reward function is inexplicit and only few samples can be used for prior-training, we formulate the ICIC problem as a distributed inverse reinforcement learning (IRL) problem following the POMDP games. Furthermore, we propose a non-prior knowledge based self-imitating learning (SIL) algorithm which incorporates Wasserstein Generative Adversarial Networks (WGANs) and Double Deep Q Network (Double DQN) algorithms for performing behavior imitation and few-shot learning in solving the IRL problem from both the policy and value. In order to cater for the plug-and-play operation mode of indoor SBSs, the Double DQN is initialized according to the SINR, and a nested training scheme is adopted to overcome the slow-start problem of the learning process. Numerical results reveal that SIL is able to implement TTI level’s decision-making to solve the ICIC problem, and the overall network throughput of SIL can be improved by up to 19.8% when compared with other known benchmark algorithms.

查看原文本刊更多论文

基于对抗学习的自主HetNets干扰协调

本文提出了一种用于自治异构网络(HetNets)的智能小区间干扰协调(ICIC)方案，其中SBSs在每个传输时间间隔(TTI)为单个用户灵活地调度子信道，目的是通过感知环境来减轻干扰并最大化长期吞吐量。由于自治HetNets只能观察到包括信噪比(SINR)在内的局部网络状态，因此将自治HetNets的干扰协调决策过程建模为非合作的部分可观察马尔可夫决策过程(POMDP)博弈，以达到纳什均衡。由于奖励函数是不明确的，只有少数样本可以用于先验训练，我们将ICIC问题表述为POMDP游戏之后的分布式逆强化学习(IRL)问题。此外，我们提出了一种基于非先验知识的自模仿学习(SIL)算法，该算法结合了Wasserstein生成对抗网络(WGANs)和双深度Q网络(Double Deep Q Network, Double DQN)算法，从策略和值两个方面对IRL问题进行了行为模仿和少镜头学习。为了适应室内SBSs即插即用的运行模式，根据SINR初始化Double DQN，并采用嵌套训练方案克服学习过程启动慢的问题。数值结果表明，SIL能够实现TTI级别的决策来解决ICIC问题，与其他已知的基准算法相比，SIL的整体网络吞吐量可提高19.8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 13th International Conference on Communication Software and Networks (ICCSN)

自引率

0.00%

发文量