Adaptive Cache Policy Optimization Through Deep Reinforcement Learning in Dynamic Cellular Networks

Intelligent and Converged Networks Pub Date : 2024-06-01 DOI:10.23919/ICN.2024.0007

Ashvin Srinivasan;Mohsen Amidzadeh;Junshan Zhang;Olav Tirkkonen

{"title":"Adaptive Cache Policy Optimization Through Deep Reinforcement Learning in Dynamic Cellular Networks","authors":"Ashvin Srinivasan;Mohsen Amidzadeh;Junshan Zhang;Olav Tirkkonen","doi":"10.23919/ICN.2024.0007","DOIUrl":null,"url":null,"abstract":"We explore the use of caching both at the network edge and within User Equipment (UE) to alleviate traffic load of wireless networks. We develop a joint cache placement and delivery policy that maximizes the Quality of Service (QoS) while simultaneously minimizing backhaul load and UE power consumption, in the presence of an unknown time-variant file popularity. With file requests in a time slot being affected by download success in the previous slot, the caching system becomes a non-stationary Partial Observable Markov Decision Process (POMDP). We solve the problem in a deep reinforcement learning framework based on the Advantageous Actor-Critic (A2C) algorithm, comparing Feed Forward Neural Networks (FFNN) with a Long Short-Term Memory (LSTM) approach specifically designed to exploit the correlation of file popularity distribution across time slots. Simulation results show that using LSTM-based A2C outperforms FFNN-based A2C in terms of sample efficiency and optimality, demonstrating superior performance for the non-stationary POMDP problem. For caching at the UEs, we provide a distributed algorithm that reaches the objectives dictated by the agent controlling the network, with minimum energy consumption at the UEs, and minimum communication overhead.","PeriodicalId":100681,"journal":{"name":"Intelligent and Converged Networks","volume":"5 2","pages":"81-99"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10601662","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent and Converged Networks","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10601662/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We explore the use of caching both at the network edge and within User Equipment (UE) to alleviate traffic load of wireless networks. We develop a joint cache placement and delivery policy that maximizes the Quality of Service (QoS) while simultaneously minimizing backhaul load and UE power consumption, in the presence of an unknown time-variant file popularity. With file requests in a time slot being affected by download success in the previous slot, the caching system becomes a non-stationary Partial Observable Markov Decision Process (POMDP). We solve the problem in a deep reinforcement learning framework based on the Advantageous Actor-Critic (A2C) algorithm, comparing Feed Forward Neural Networks (FFNN) with a Long Short-Term Memory (LSTM) approach specifically designed to exploit the correlation of file popularity distribution across time slots. Simulation results show that using LSTM-based A2C outperforms FFNN-based A2C in terms of sample efficiency and optimality, demonstrating superior performance for the non-stationary POMDP problem. For caching at the UEs, we provide a distributed algorithm that reaches the objectives dictated by the agent controlling the network, with minimum energy consumption at the UEs, and minimum communication overhead.

查看原文本刊更多论文

通过动态蜂窝网络中的深度强化学习实现自适应缓存策略优化

我们探索在网络边缘和用户设备（UE）内使用缓存来减轻无线网络的流量负荷。我们开发了一种联合缓存放置和传输策略，在未知时变文件流行度的情况下，最大限度地提高服务质量（QoS），同时最大限度地降低回程负载和 UE 功耗。由于某一时间段的文件请求会受到前一时间段下载成功与否的影响，缓存系统就变成了一个非稳态的部分可观测马尔可夫决策过程（POMDP）。我们在基于优势行为批判者（A2C）算法的深度强化学习框架中解决了这一问题，比较了前馈神经网络（FFNN）和长短期记忆（LSTM）方法，后者是专门为利用文件流行度在各时段分布的相关性而设计的。仿真结果表明，基于 LSTM 的 A2C 在采样效率和最优性方面优于基于 FFNN 的 A2C，在非稳态 POMDP 问题上表现出了卓越的性能。对于 UE 上的缓存，我们提供了一种分布式算法，该算法可实现由控制网络的代理决定的目标，同时将 UE 上的能耗降至最低，并将通信开销降至最低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Intelligent and Converged Networks

自引率

0.00%

发文量