通过动态蜂窝网络中的深度强化学习实现自适应缓存策略优化

Intelligent and Converged Networks Pub Date : 2024-06-01 DOI:10.23919/ICN.2024.0007

Ashvin Srinivasan;Mohsen Amidzadeh;Junshan Zhang;Olav Tirkkonen

{"title":"通过动态蜂窝网络中的深度强化学习实现自适应缓存策略优化","authors":"Ashvin Srinivasan;Mohsen Amidzadeh;Junshan Zhang;Olav Tirkkonen","doi":"10.23919/ICN.2024.0007","DOIUrl":null,"url":null,"abstract":"We explore the use of caching both at the network edge and within User Equipment (UE) to alleviate traffic load of wireless networks. We develop a joint cache placement and delivery policy that maximizes the Quality of Service (QoS) while simultaneously minimizing backhaul load and UE power consumption, in the presence of an unknown time-variant file popularity. With file requests in a time slot being affected by download success in the previous slot, the caching system becomes a non-stationary Partial Observable Markov Decision Process (POMDP). We solve the problem in a deep reinforcement learning framework based on the Advantageous Actor-Critic (A2C) algorithm, comparing Feed Forward Neural Networks (FFNN) with a Long Short-Term Memory (LSTM) approach specifically designed to exploit the correlation of file popularity distribution across time slots. Simulation results show that using LSTM-based A2C outperforms FFNN-based A2C in terms of sample efficiency and optimality, demonstrating superior performance for the non-stationary POMDP problem. For caching at the UEs, we provide a distributed algorithm that reaches the objectives dictated by the agent controlling the network, with minimum energy consumption at the UEs, and minimum communication overhead.","PeriodicalId":100681,"journal":{"name":"Intelligent and Converged Networks","volume":"5 2","pages":"81-99"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10601662","citationCount":"0","resultStr":"{\"title\":\"Adaptive Cache Policy Optimization Through Deep Reinforcement Learning in Dynamic Cellular Networks\",\"authors\":\"Ashvin Srinivasan;Mohsen Amidzadeh;Junshan Zhang;Olav Tirkkonen\",\"doi\":\"10.23919/ICN.2024.0007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We explore the use of caching both at the network edge and within User Equipment (UE) to alleviate traffic load of wireless networks. We develop a joint cache placement and delivery policy that maximizes the Quality of Service (QoS) while simultaneously minimizing backhaul load and UE power consumption, in the presence of an unknown time-variant file popularity. With file requests in a time slot being affected by download success in the previous slot, the caching system becomes a non-stationary Partial Observable Markov Decision Process (POMDP). We solve the problem in a deep reinforcement learning framework based on the Advantageous Actor-Critic (A2C) algorithm, comparing Feed Forward Neural Networks (FFNN) with a Long Short-Term Memory (LSTM) approach specifically designed to exploit the correlation of file popularity distribution across time slots. Simulation results show that using LSTM-based A2C outperforms FFNN-based A2C in terms of sample efficiency and optimality, demonstrating superior performance for the non-stationary POMDP problem. For caching at the UEs, we provide a distributed algorithm that reaches the objectives dictated by the agent controlling the network, with minimum energy consumption at the UEs, and minimum communication overhead.\",\"PeriodicalId\":100681,\"journal\":{\"name\":\"Intelligent and Converged Networks\",\"volume\":\"5 2\",\"pages\":\"81-99\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10601662\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligent and Converged Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10601662/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent and Converged Networks","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10601662/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们探索在网络边缘和用户设备（UE）内使用缓存来减轻无线网络的流量负荷。我们开发了一种联合缓存放置和传输策略，在未知时变文件流行度的情况下，最大限度地提高服务质量（QoS），同时最大限度地降低回程负载和 UE 功耗。由于某一时间段的文件请求会受到前一时间段下载成功与否的影响，缓存系统就变成了一个非稳态的部分可观测马尔可夫决策过程（POMDP）。我们在基于优势行为批判者（A2C）算法的深度强化学习框架中解决了这一问题，比较了前馈神经网络（FFNN）和长短期记忆（LSTM）方法，后者是专门为利用文件流行度在各时段分布的相关性而设计的。仿真结果表明，基于 LSTM 的 A2C 在采样效率和最优性方面优于基于 FFNN 的 A2C，在非稳态 POMDP 问题上表现出了卓越的性能。对于 UE 上的缓存，我们提供了一种分布式算法，该算法可实现由控制网络的代理决定的目标，同时将 UE 上的能耗降至最低，并将通信开销降至最低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adaptive Cache Policy Optimization Through Deep Reinforcement Learning in Dynamic Cellular Networks

We explore the use of caching both at the network edge and within User Equipment (UE) to alleviate traffic load of wireless networks. We develop a joint cache placement and delivery policy that maximizes the Quality of Service (QoS) while simultaneously minimizing backhaul load and UE power consumption, in the presence of an unknown time-variant file popularity. With file requests in a time slot being affected by download success in the previous slot, the caching system becomes a non-stationary Partial Observable Markov Decision Process (POMDP). We solve the problem in a deep reinforcement learning framework based on the Advantageous Actor-Critic (A2C) algorithm, comparing Feed Forward Neural Networks (FFNN) with a Long Short-Term Memory (LSTM) approach specifically designed to exploit the correlation of file popularity distribution across time slots. Simulation results show that using LSTM-based A2C outperforms FFNN-based A2C in terms of sample efficiency and optimality, demonstrating superior performance for the non-stationary POMDP problem. For caching at the UEs, we provide a distributed algorithm that reaches the objectives dictated by the agent controlling the network, with minimum energy consumption at the UEs, and minimum communication overhead.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Intelligent and Converged Networks

自引率

0.00%

发文量