Adaptive Cache Policy Optimization Through Deep Reinforcement Learning in Dynamic Cellular Networks

Ashvin Srinivasan;Mohsen Amidzadeh;Junshan Zhang;Olav Tirkkonen
{"title":"Adaptive Cache Policy Optimization Through Deep Reinforcement Learning in Dynamic Cellular Networks","authors":"Ashvin Srinivasan;Mohsen Amidzadeh;Junshan Zhang;Olav Tirkkonen","doi":"10.23919/ICN.2024.0007","DOIUrl":null,"url":null,"abstract":"We explore the use of caching both at the network edge and within User Equipment (UE) to alleviate traffic load of wireless networks. We develop a joint cache placement and delivery policy that maximizes the Quality of Service (QoS) while simultaneously minimizing backhaul load and UE power consumption, in the presence of an unknown time-variant file popularity. With file requests in a time slot being affected by download success in the previous slot, the caching system becomes a non-stationary Partial Observable Markov Decision Process (POMDP). We solve the problem in a deep reinforcement learning framework based on the Advantageous Actor-Critic (A2C) algorithm, comparing Feed Forward Neural Networks (FFNN) with a Long Short-Term Memory (LSTM) approach specifically designed to exploit the correlation of file popularity distribution across time slots. Simulation results show that using LSTM-based A2C outperforms FFNN-based A2C in terms of sample efficiency and optimality, demonstrating superior performance for the non-stationary POMDP problem. For caching at the UEs, we provide a distributed algorithm that reaches the objectives dictated by the agent controlling the network, with minimum energy consumption at the UEs, and minimum communication overhead.","PeriodicalId":100681,"journal":{"name":"Intelligent and Converged Networks","volume":"5 2","pages":"81-99"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10601662","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent and Converged Networks","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10601662/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We explore the use of caching both at the network edge and within User Equipment (UE) to alleviate traffic load of wireless networks. We develop a joint cache placement and delivery policy that maximizes the Quality of Service (QoS) while simultaneously minimizing backhaul load and UE power consumption, in the presence of an unknown time-variant file popularity. With file requests in a time slot being affected by download success in the previous slot, the caching system becomes a non-stationary Partial Observable Markov Decision Process (POMDP). We solve the problem in a deep reinforcement learning framework based on the Advantageous Actor-Critic (A2C) algorithm, comparing Feed Forward Neural Networks (FFNN) with a Long Short-Term Memory (LSTM) approach specifically designed to exploit the correlation of file popularity distribution across time slots. Simulation results show that using LSTM-based A2C outperforms FFNN-based A2C in terms of sample efficiency and optimality, demonstrating superior performance for the non-stationary POMDP problem. For caching at the UEs, we provide a distributed algorithm that reaches the objectives dictated by the agent controlling the network, with minimum energy consumption at the UEs, and minimum communication overhead.
通过动态蜂窝网络中的深度强化学习实现自适应缓存策略优化
我们探索在网络边缘和用户设备(UE)内使用缓存来减轻无线网络的流量负荷。我们开发了一种联合缓存放置和传输策略,在未知时变文件流行度的情况下,最大限度地提高服务质量(QoS),同时最大限度地降低回程负载和 UE 功耗。由于某一时间段的文件请求会受到前一时间段下载成功与否的影响,缓存系统就变成了一个非稳态的部分可观测马尔可夫决策过程(POMDP)。我们在基于优势行为批判者(A2C)算法的深度强化学习框架中解决了这一问题,比较了前馈神经网络(FFNN)和长短期记忆(LSTM)方法,后者是专门为利用文件流行度在各时段分布的相关性而设计的。仿真结果表明,基于 LSTM 的 A2C 在采样效率和最优性方面优于基于 FFNN 的 A2C,在非稳态 POMDP 问题上表现出了卓越的性能。对于 UE 上的缓存,我们提供了一种分布式算法,该算法可实现由控制网络的代理决定的目标,同时将 UE 上的能耗降至最低,并将通信开销降至最低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信