{"title":"公平概率多臂匪徒与网络优化应用","authors":"Zhiwu Guo;Chicheng Zhang;Ming Li;Marwan Krunz","doi":"10.1109/TMLCN.2024.3421170","DOIUrl":null,"url":null,"abstract":"Online learning, particularly Multi-Armed Bandit (MAB) algorithms, has been extensively adopted in various real-world networking applications. In certain applications, such as fair heterogeneous networks coexistence, multiple links (individual arms) are selected in each round, and the throughputs (rewards) of these arms depend on the chosen set of links. Additionally, ensuring fairness among individual arms is a critical objective. However, existing MAB algorithms are unsuitable for these applications due to different models and assumptions. In this paper, we introduce a new fair probabilistic MAB (FP-MAB) problem aimed at either maximizing the minimum reward for all arms or maximizing the total reward while imposing a fairness constraint that guarantees a minimum selection fraction for each arm. In FP-MAB, the learning agent probabilistically selects a meta-arm, which is associated with one or multiple individual arms in each decision round. To address the FP-MAB problem, we propose two algorithms: Fair Probabilistic Explore-Then-Commit (FP-ETC) and Fair Probabilistic Optimism In the Face of Uncertainty (FP-OFU). We also introduce a novel concept of regret in the context of the max-min fairness objective. We analyze the performance of FP-ETC and FP-OFU in terms of the upper bound of average regret and average constraint violation. Simulation results demonstrate that FP-ETC and FP-OFU achieve lower regrets (or higher objective values) under the same fairness requirements compared to existing MAB algorithms.","PeriodicalId":100641,"journal":{"name":"IEEE Transactions on Machine Learning in Communications and Networking","volume":"2 ","pages":"994-1016"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10579843","citationCount":"0","resultStr":"{\"title\":\"Fair Probabilistic Multi-Armed Bandit With Applications to Network Optimization\",\"authors\":\"Zhiwu Guo;Chicheng Zhang;Ming Li;Marwan Krunz\",\"doi\":\"10.1109/TMLCN.2024.3421170\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Online learning, particularly Multi-Armed Bandit (MAB) algorithms, has been extensively adopted in various real-world networking applications. In certain applications, such as fair heterogeneous networks coexistence, multiple links (individual arms) are selected in each round, and the throughputs (rewards) of these arms depend on the chosen set of links. Additionally, ensuring fairness among individual arms is a critical objective. However, existing MAB algorithms are unsuitable for these applications due to different models and assumptions. In this paper, we introduce a new fair probabilistic MAB (FP-MAB) problem aimed at either maximizing the minimum reward for all arms or maximizing the total reward while imposing a fairness constraint that guarantees a minimum selection fraction for each arm. In FP-MAB, the learning agent probabilistically selects a meta-arm, which is associated with one or multiple individual arms in each decision round. To address the FP-MAB problem, we propose two algorithms: Fair Probabilistic Explore-Then-Commit (FP-ETC) and Fair Probabilistic Optimism In the Face of Uncertainty (FP-OFU). We also introduce a novel concept of regret in the context of the max-min fairness objective. We analyze the performance of FP-ETC and FP-OFU in terms of the upper bound of average regret and average constraint violation. Simulation results demonstrate that FP-ETC and FP-OFU achieve lower regrets (or higher objective values) under the same fairness requirements compared to existing MAB algorithms.\",\"PeriodicalId\":100641,\"journal\":{\"name\":\"IEEE Transactions on Machine Learning in Communications and Networking\",\"volume\":\"2 \",\"pages\":\"994-1016\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10579843\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Machine Learning in Communications and Networking\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10579843/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Machine Learning in Communications and Networking","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10579843/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
在线学习,尤其是多臂匪徒(MAB)算法,已被广泛应用于现实世界的各种网络应用中。在某些应用中,例如公平异构网络共存,每轮都会选择多个链路(单臂),而这些单臂的吞吐量(奖励)取决于所选的链路集。此外,确保各个臂之间的公平性也是一个关键目标。然而,由于模型和假设不同,现有的 MAB 算法并不适合这些应用。在本文中,我们引入了一个新的公平概率 MAB(FP-MAB)问题,旨在最大化所有臂的最小奖励,或最大化总奖励,同时施加公平约束,保证每个臂的最小选择分数。在 FP-MAB 中,学习代理以概率方式选择元臂,元臂在每轮决策中与一个或多个单臂相关联。为解决 FP-MAB 问题,我们提出了两种算法:公平概率探索-然后承诺(FP-ETC)和公平概率不确定性乐观(FP-OFU)。我们还在最大最小公平目标的背景下引入了一个新的遗憾概念。我们从平均遗憾上限和平均违反约束上限的角度分析了 FP-ETC 和 FP-OFU 的性能。仿真结果表明,与现有的 MAB 算法相比,在相同的公平性要求下,FP-ETC 和 FP-OFU 能获得更低的遗憾值(或更高的目标值)。
Fair Probabilistic Multi-Armed Bandit With Applications to Network Optimization
Online learning, particularly Multi-Armed Bandit (MAB) algorithms, has been extensively adopted in various real-world networking applications. In certain applications, such as fair heterogeneous networks coexistence, multiple links (individual arms) are selected in each round, and the throughputs (rewards) of these arms depend on the chosen set of links. Additionally, ensuring fairness among individual arms is a critical objective. However, existing MAB algorithms are unsuitable for these applications due to different models and assumptions. In this paper, we introduce a new fair probabilistic MAB (FP-MAB) problem aimed at either maximizing the minimum reward for all arms or maximizing the total reward while imposing a fairness constraint that guarantees a minimum selection fraction for each arm. In FP-MAB, the learning agent probabilistically selects a meta-arm, which is associated with one or multiple individual arms in each decision round. To address the FP-MAB problem, we propose two algorithms: Fair Probabilistic Explore-Then-Commit (FP-ETC) and Fair Probabilistic Optimism In the Face of Uncertainty (FP-OFU). We also introduce a novel concept of regret in the context of the max-min fairness objective. We analyze the performance of FP-ETC and FP-OFU in terms of the upper bound of average regret and average constraint violation. Simulation results demonstrate that FP-ETC and FP-OFU achieve lower regrets (or higher objective values) under the same fairness requirements compared to existing MAB algorithms.