Achieving Fairness in the Stochastic Multi-armed Bandit Problem

Vishakha Patil, Ganesh Ghalme, V. Nair, Y. Narahari
{"title":"Achieving Fairness in the Stochastic Multi-armed Bandit Problem","authors":"Vishakha Patil, Ganesh Ghalme, V. Nair, Y. Narahari","doi":"10.1609/AAAI.V34I04.5986","DOIUrl":null,"url":null,"abstract":"We study an interesting variant of the stochastic multi-armed bandit problem, called the Fair-SMAB problem, where each arm is required to be pulled for at least a given fraction of the total available rounds. We investigate the interplay between learning and fairness in terms of a pre-specified vector denoting the fractions of guaranteed pulls. We define a fairness-aware regret, called $r$-Regret, that takes into account the above fairness constraints and naturally extends the conventional notion of regret. Our primary contribution is characterizing a class of Fair-SMAB algorithms by two parameters: the unfairness tolerance and the learning algorithm used as a black-box. We provide a fairness guarantee for this class that holds uniformly over time irrespective of the choice of the learning algorithm. In particular, when the learning algorithm is UCB1, we show that our algorithm achieves $O(\\ln T)$ $r$-Regret. Finally, we evaluate the cost of fairness in terms of the conventional notion of regret.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"15 1","pages":"174:1-174:31"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"84","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Mach. Learn. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/AAAI.V34I04.5986","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 84

Abstract

We study an interesting variant of the stochastic multi-armed bandit problem, called the Fair-SMAB problem, where each arm is required to be pulled for at least a given fraction of the total available rounds. We investigate the interplay between learning and fairness in terms of a pre-specified vector denoting the fractions of guaranteed pulls. We define a fairness-aware regret, called $r$-Regret, that takes into account the above fairness constraints and naturally extends the conventional notion of regret. Our primary contribution is characterizing a class of Fair-SMAB algorithms by two parameters: the unfairness tolerance and the learning algorithm used as a black-box. We provide a fairness guarantee for this class that holds uniformly over time irrespective of the choice of the learning algorithm. In particular, when the learning algorithm is UCB1, we show that our algorithm achieves $O(\ln T)$ $r$-Regret. Finally, we evaluate the cost of fairness in terms of the conventional notion of regret.
随机多臂盗匪问题公平性的实现
我们研究了随机多臂强盗问题的一个有趣的变体,称为Fair-SMAB问题,其中每只手臂被要求至少在总可用回合的给定分数内被拉动。我们研究了学习和公平之间的相互作用,根据一个预先指定的向量表示保证牵引力的分数。我们定义了一种公平意识的后悔,称为$r$-后悔,它考虑了上述公平约束,自然地扩展了传统的后悔概念。我们的主要贡献是通过两个参数来表征一类Fair-SMAB算法:不公平容忍度和用作黑盒的学习算法。我们为这个类提供了一个公平的保证,无论学习算法的选择如何,它都会随着时间的推移而保持一致。特别是,当学习算法为UCB1时,我们证明了我们的算法达到了$O(\ln T)$ $r$-悔恨。最后,我们根据传统的后悔概念来评估公平的成本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信