A Novel On-Policy DRL-Based Approach for Resource Allocation in Hybrid RF/VLC Systems

IF 4.3 2区计算机科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Consumer Electronics Pub Date : 2025-01-17 DOI:10.1109/TCE.2025.3529846

Tanya Verma;Arif Raza;Shivanshu Shrivastava;Amit Kumar;Dwarkadas Prahladadas Kothari;Umakant Dhar Dwivedi

{"title":"A Novel On-Policy DRL-Based Approach for Resource Allocation in Hybrid RF/VLC Systems","authors":"Tanya Verma;Arif Raza;Shivanshu Shrivastava;Amit Kumar;Dwarkadas Prahladadas Kothari;Umakant Dhar Dwivedi","doi":"10.1109/TCE.2025.3529846","DOIUrl":null,"url":null,"abstract":"Visible light communication (VLC) has emerged as a promising technology, delivering high-speed data transmission for 5G and beyond communication. Nevertheless, its susceptibility to blockages demands a co-deployment with traditional radio frequency (RF) systems to ensure uninterrupted connectivity. This co-deployment, known as a hybrid RF/VLC system, is a subset of heterogeneous networks (HetNets) and offers interoperability, energy efficiency, and optimal resource utilization. In hybrid RF/VLC, efficient resource allocation and load balancing are crucial. Existing Deep Q-Network (DQN) learning-based methods designed to address these issues, fail in large and dynamic environments. Our present study investigates alternative approaches for optimal resource allocation and load balancing in dynamic and large hybrid RF/VLC systems, to achieve maximum data rates for users. We propose two model-free on-policy deep reinforcement learning (DRL) based schemes, namely advantage actor-critic (A2C) and proximal policy optimization (PPO), for efficient resource allocation in hybrid RF/VLC. Simulation results show that the A2C and PPO based schemes outperform the DQN learning scheme by 31.3% and 32.5%, respectively, in terms of data rates. The proposed schemes also outperform the deep deterministic policy gradient (DDPG) in data rate maximization by up to 8.1% and 9.7%, respectively.","PeriodicalId":13208,"journal":{"name":"IEEE Transactions on Consumer Electronics","volume":"71 1","pages":"550-560"},"PeriodicalIF":4.3000,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Consumer Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10844503/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Visible light communication (VLC) has emerged as a promising technology, delivering high-speed data transmission for 5G and beyond communication. Nevertheless, its susceptibility to blockages demands a co-deployment with traditional radio frequency (RF) systems to ensure uninterrupted connectivity. This co-deployment, known as a hybrid RF/VLC system, is a subset of heterogeneous networks (HetNets) and offers interoperability, energy efficiency, and optimal resource utilization. In hybrid RF/VLC, efficient resource allocation and load balancing are crucial. Existing Deep Q-Network (DQN) learning-based methods designed to address these issues, fail in large and dynamic environments. Our present study investigates alternative approaches for optimal resource allocation and load balancing in dynamic and large hybrid RF/VLC systems, to achieve maximum data rates for users. We propose two model-free on-policy deep reinforcement learning (DRL) based schemes, namely advantage actor-critic (A2C) and proximal policy optimization (PPO), for efficient resource allocation in hybrid RF/VLC. Simulation results show that the A2C and PPO based schemes outperform the DQN learning scheme by 31.3% and 32.5%, respectively, in terms of data rates. The proposed schemes also outperform the deep deterministic policy gradient (DDPG) in data rate maximization by up to 8.1% and 9.7%, respectively.

查看原文本刊更多论文

一种基于策略drl的混合射频/VLC系统资源分配新方法

可见光通信（VLC）已经成为一项有前途的技术，为5G及以后的通信提供高速数据传输。然而，它对阻塞的敏感性要求与传统射频（RF）系统共同部署，以确保不间断连接。这种共同部署被称为混合RF/VLC系统，是异构网络（HetNets）的一个子集，可提供互操作性、能源效率和最佳资源利用。在混合射频/VLC中，有效的资源分配和负载平衡至关重要。现有的基于深度q网络（DQN）学习的方法旨在解决这些问题，但在大型和动态环境中失败。我们目前的研究探讨了动态和大型混合RF/VLC系统中最佳资源分配和负载平衡的替代方法，以实现用户的最大数据速率。我们提出了两种基于无模型的基于策略的深度强化学习（DRL）方案，即优势行为者批评家（A2C）和近端策略优化（PPO），用于混合RF/VLC的有效资源分配。仿真结果表明，基于A2C和PPO的学习方案在数据速率方面分别优于DQN学习方案31.3%和32.5%。在数据速率最大化方面，所提出的方案也比深度确定性策略梯度（DDPG）分别高出8.1%和9.7%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Consumer Electronics 工程技术-电信学

CiteScore

7.70

自引率

9.30%

发文量

审稿时长

3.3 months

期刊介绍： The main focus for the IEEE Transactions on Consumer Electronics is the engineering and research aspects of the theory, design, construction, manufacture or end use of mass market electronics, systems, software and services for consumers.