不同需求概率内容的主动缓存强化学习

S. Somuyiwa, Deniz Gündüz, A. György
{"title":"不同需求概率内容的主动缓存强化学习","authors":"S. Somuyiwa, Deniz Gündüz, A. György","doi":"10.1109/ISWCS.2018.8491205","DOIUrl":null,"url":null,"abstract":"A mobile user randomly accessing a dynamic content library over a wireless channel is considered. At each time instant, a random number of contents are added to the library and each content remains relevant to the user for a random period of time. Contents are classified into finitely many classes such that whenever the user accesses the system, he requests each content randomly with a class-specific demand probability. Contents are downloaded to the user equipment (UE) through a wireless link whose quality also varies randomly with time. The UE has a cache memory of finite capacity, which can be used to proactively store contents before they are requested by the user. Any time contents are downloaded, the system incurs a cost (energy, bandwidth, etc.) that depends on the channel state at the time of download, and scales linearly with the number of contents downloaded. Our goal is to minimize the expected long-term average cost. The problem is modeled as a Markov decision process, and the optimal policy is shown to exhibit a threshold structure; however, since finding the optimal policy is computationally infeasible, parametric approximations to the optimal policy are considered, whose parameters are optimized using the policy gradient method. Numerical simulations show that the performance gain of the resulting scheme over traditional reactive content delivery is significant, and increases with the cache capacity. Comparisons with two performance lower bounds, one computed based on infinite cache capacity and another based on non-casual knowledge of the user access times and content requests, demonstrate that our scheme can perform close to the theoretical optimum.","PeriodicalId":272951,"journal":{"name":"2018 15th International Symposium on Wireless Communication Systems (ISWCS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Reinforcement Learning for Proactive Caching of Contents with Different Demand Probabilities\",\"authors\":\"S. Somuyiwa, Deniz Gündüz, A. György\",\"doi\":\"10.1109/ISWCS.2018.8491205\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A mobile user randomly accessing a dynamic content library over a wireless channel is considered. At each time instant, a random number of contents are added to the library and each content remains relevant to the user for a random period of time. Contents are classified into finitely many classes such that whenever the user accesses the system, he requests each content randomly with a class-specific demand probability. Contents are downloaded to the user equipment (UE) through a wireless link whose quality also varies randomly with time. The UE has a cache memory of finite capacity, which can be used to proactively store contents before they are requested by the user. Any time contents are downloaded, the system incurs a cost (energy, bandwidth, etc.) that depends on the channel state at the time of download, and scales linearly with the number of contents downloaded. Our goal is to minimize the expected long-term average cost. The problem is modeled as a Markov decision process, and the optimal policy is shown to exhibit a threshold structure; however, since finding the optimal policy is computationally infeasible, parametric approximations to the optimal policy are considered, whose parameters are optimized using the policy gradient method. Numerical simulations show that the performance gain of the resulting scheme over traditional reactive content delivery is significant, and increases with the cache capacity. Comparisons with two performance lower bounds, one computed based on infinite cache capacity and another based on non-casual knowledge of the user access times and content requests, demonstrate that our scheme can perform close to the theoretical optimum.\",\"PeriodicalId\":272951,\"journal\":{\"name\":\"2018 15th International Symposium on Wireless Communication Systems (ISWCS)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 15th International Symposium on Wireless Communication Systems (ISWCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISWCS.2018.8491205\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 15th International Symposium on Wireless Communication Systems (ISWCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISWCS.2018.8491205","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

摘要

考虑通过无线信道随机访问动态内容库的移动用户。在每个时间瞬间,将随机数量的内容添加到库中,并且每个内容在随机时间段内与用户保持相关。内容被划分为有限多个类,这样每当用户访问系统时,他都会以特定于类的需求概率随机请求每个内容。内容通过无线链路下载到用户设备(UE),无线链路的质量也随时间随机变化。UE具有有限容量的缓存,可用于在用户请求之前主动存储内容。任何时候下载内容,系统都会产生成本(能量、带宽等),这取决于下载时的通道状态,并与下载的内容数量成线性比例。我们的目标是最小化预期的长期平均成本。将该问题建模为马尔可夫决策过程,最优策略表现为阈值结构;然而,由于寻找最优策略在计算上是不可行的,因此考虑最优策略的参数逼近,并使用策略梯度法对其参数进行优化。数值模拟结果表明,与传统的响应式内容分发方案相比,该方案的性能增益显著,并且随着缓存容量的增加而增加。与两个性能下界(一个基于无限缓存容量计算,另一个基于用户访问时间和内容请求的非偶然知识计算)的比较表明,我们的方案可以执行接近理论最优。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Reinforcement Learning for Proactive Caching of Contents with Different Demand Probabilities
A mobile user randomly accessing a dynamic content library over a wireless channel is considered. At each time instant, a random number of contents are added to the library and each content remains relevant to the user for a random period of time. Contents are classified into finitely many classes such that whenever the user accesses the system, he requests each content randomly with a class-specific demand probability. Contents are downloaded to the user equipment (UE) through a wireless link whose quality also varies randomly with time. The UE has a cache memory of finite capacity, which can be used to proactively store contents before they are requested by the user. Any time contents are downloaded, the system incurs a cost (energy, bandwidth, etc.) that depends on the channel state at the time of download, and scales linearly with the number of contents downloaded. Our goal is to minimize the expected long-term average cost. The problem is modeled as a Markov decision process, and the optimal policy is shown to exhibit a threshold structure; however, since finding the optimal policy is computationally infeasible, parametric approximations to the optimal policy are considered, whose parameters are optimized using the policy gradient method. Numerical simulations show that the performance gain of the resulting scheme over traditional reactive content delivery is significant, and increases with the cache capacity. Comparisons with two performance lower bounds, one computed based on infinite cache capacity and another based on non-casual knowledge of the user access times and content requests, demonstrate that our scheme can perform close to the theoretical optimum.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信