A Sequential Experience-driven Contextual Bandit Policy for MIMO TWAF Online Relay Selection

2022 IEEE 23rd International Workshop on Signal Processing Advances in Wireless Communication (SPAWC) Pub Date : 2022-07-04 DOI:10.1109/spawc51304.2022.9834018

Ankit Gupta, M. Sellathurai, T. Ratnarajah

引用次数: 0

Abstract

In this work, we derive a sequential experience-driven contextual bandit (CB)-based policies for online relay selection in multiple-input multiple-output (MIMO) two-way amplify-and-forward (TWAF) relay networks, where the relays are provided with quantized imperfect channel gain information. The proposed CB-based policy acquires information about the optimal relay node by resolving the exploration-versus-exploitation dilemma. In particular, we propose a linear upper confidence bound (LinUCB)-based CB policy, and an adaptive active greedy (AAG)-based CB policy that utilizes active learning heuristics. With simulation results, we show that the proposed CB-based policies can reduce the feedback overhead by a factor of eight and time-cost by 70% while outperforming the best conventional Gram-Schmidt (GS) algorithm.

查看原文本刊更多论文

MIMO TWAF在线中继选择的顺序经验驱动的上下文强盗策略

在这项工作中，我们推导了一种基于顺序经验驱动的上下文强盗(CB)策略，用于多输入多输出(MIMO)双向放大转发(TWAF)中继网络中的在线中继选择，其中中继提供量化的不完美信道增益信息。提出的基于cb的策略通过解决探索与开发的困境来获取最优中继节点的信息。特别地，我们提出了一种基于线性上置信度(LinUCB)的CB策略，以及一种利用主动学习启发式的基于自适应主动贪婪(AAG)的CB策略。通过仿真结果，我们表明所提出的基于cb的策略可以将反馈开销减少8倍，时间成本减少70%，同时优于最佳的传统Gram-Schmidt (GS)算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 23rd International Workshop on Signal Processing Advances in Wireless Communication (SPAWC)

自引率

0.00%

发文量