Whittle Index-Based Q-Learning for Wireless Edge Caching With Linear Function Approximation

IF 3.6 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE/ACM Transactions on Networking Pub Date : 2024-06-24 DOI:10.1109/TNET.2024.3417351

Guojun Xiong;Shufan Wang;Jian Li;Rahul Singh

{"title":"Whittle Index-Based Q-Learning for Wireless Edge Caching With Linear Function Approximation","authors":"Guojun Xiong;Shufan Wang;Jian Li;Rahul Singh","doi":"10.1109/TNET.2024.3417351","DOIUrl":null,"url":null,"abstract":"We consider the problem of content caching at the wireless edge to serve a set of end users via unreliable wireless channels so as to minimize the average latency experienced by end users due to the constrained wireless edge cache capacity. We formulate this problem as a Markov decision process, or more specifically a restless multi-armed bandit problem, which is provably hard to solve. We begin by investigating a discounted counterpart, and prove that it admits an optimal policy of the threshold-type. We then show that this result also holds for average latency problem. Using this structural result, we establish the indexability of our problem, and employ the Whittle index policy to minimize average latency. Since system parameters such as content request rates and wireless channel conditions are often unknown and time-varying, we further develop a model-free reinforcement learning algorithm dubbed as \n<monospace>Q+-Whittle</monospace>\n that relies on Whittle index policy. However, \n<monospace>Q+-Whittle</monospace>\n requires to store the Q-function values for all state-action pairs, the number of which can be extremely large for wireless edge caching. To this end, we approximate the Q-function by a parameterized function class with a much smaller dimension, and further design a \n<monospace>Q+-Whittle</monospace>\n algorithm with linear function approximation, which is called \n<monospace>Q+-Whittle-LFA</monospace>\n. We provide a finite-time bound on the mean-square error of \n<monospace>Q+-Whittle-LFA</monospace>\n. Simulation results using real traces demonstrate that \n<monospace>Q+-Whittle-LFA</monospace>\n yields excellent empirical performance.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 5","pages":"4286-4301"},"PeriodicalIF":3.6000,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10570315/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

We consider the problem of content caching at the wireless edge to serve a set of end users via unreliable wireless channels so as to minimize the average latency experienced by end users due to the constrained wireless edge cache capacity. We formulate this problem as a Markov decision process, or more specifically a restless multi-armed bandit problem, which is provably hard to solve. We begin by investigating a discounted counterpart, and prove that it admits an optimal policy of the threshold-type. We then show that this result also holds for average latency problem. Using this structural result, we establish the indexability of our problem, and employ the Whittle index policy to minimize average latency. Since system parameters such as content request rates and wireless channel conditions are often unknown and time-varying, we further develop a model-free reinforcement learning algorithm dubbed as Q+-Whittle that relies on Whittle index policy. However, Q+-Whittle requires to store the Q-function values for all state-action pairs, the number of which can be extremely large for wireless edge caching. To this end, we approximate the Q-function by a parameterized function class with a much smaller dimension, and further design a Q+-Whittle algorithm with linear function approximation, which is called Q+-Whittle-LFA . We provide a finite-time bound on the mean-square error of Q+-Whittle-LFA . Simulation results using real traces demonstrate that Q+-Whittle-LFA yields excellent empirical performance.

查看原文本刊更多论文

基于惠特尔索引的 Q-学习，用于线性函数逼近的无线边缘缓存

我们考虑的问题是在无线边缘进行内容缓存，通过不可靠的无线信道为一组终端用户提供服务，从而最大限度地减少终端用户因无线边缘缓存容量受限而经历的平均延迟。我们将这一问题表述为一个马尔可夫决策过程，或者更具体地说是一个不安分的多臂强盗问题，这个问题很难解决。我们首先研究了一个贴现对应问题，并证明它允许一个阈值类型的最优策略。然后，我们证明这一结果也适用于平均延迟问题。利用这一结构性结果，我们建立了问题的可索引性，并采用惠特尔索引策略来最小化平均延迟。由于内容请求率和无线信道条件等系统参数通常是未知和时变的，我们进一步开发了一种无模型强化学习算法，称为 Q+-Whittle，它依赖于惠特尔索引策略。然而，Q+-Whittle 需要存储所有状态-动作对的 Q 函数值，而对于无线边缘缓存来说，Q 函数值的数量可能非常大。为此，我们用维度更小的参数化函数类来近似 Q 函数，并进一步设计了一种线性函数近似的 Q+-Whittle 算法，称为 Q+-Whittle-LFA。我们给出了 Q+-Whittle-LFA 均方误差的有限时间约束。使用真实轨迹的仿真结果表明，Q+-Whittle-LFA 具有出色的经验性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE/ACM Transactions on Networking 工程技术-电信学

CiteScore

8.20

自引率

5.40%

发文量

246

审稿时长

4-8 weeks

期刊介绍： The IEEE/ACM Transactions on Networking’s high-level objective is to publish high-quality, original research results derived from theoretical or experimental exploration of the area of communication/computer networking, covering all sorts of information transport networks over all sorts of physical layer technologies, both wireline (all kinds of guided media: e.g., copper, optical) and wireless (e.g., radio-frequency, acoustic (e.g., underwater), infra-red), or hybrids of these. The journal welcomes applied contributions reporting on novel experiences and experiments with actual systems.