Best-of-Both-Worlds Algorithms for Partial Monitoring

Taira Tsuchiya, Shinji Ito, J. Honda
{"title":"Best-of-Both-Worlds Algorithms for Partial Monitoring","authors":"Taira Tsuchiya, Shinji Ito, J. Honda","doi":"10.48550/arXiv.2207.14550","DOIUrl":null,"url":null,"abstract":"This study considers the partial monitoring problem with $k$-actions and $d$-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are favorably bounded both in the stochastic and adversarial regimes. In particular, we show that for non-degenerate locally observable games, the regret is $O(m^2 k^4 \\log(T) \\log(k_{\\Pi} T) / \\Delta_{\\min})$ in the stochastic regime and $O(m k^{2/3} \\sqrt{T \\log(T) \\log k_{\\Pi}})$ in the adversarial regime, where $T$ is the number of rounds, $m$ is the maximum number of distinct observations per action, $\\Delta_{\\min}$ is the minimum suboptimality gap, and $k_{\\Pi}$ is the number of Pareto optimal actions. Moreover, we show that for globally observable games, the regret is $O(c_{\\mathcal{G}}^2 \\log(T) \\log(k_{\\Pi} T) / \\Delta_{\\min}^2)$ in the stochastic regime and $O((c_{\\mathcal{G}}^2 \\log(T) \\log(k_{\\Pi} T))^{1/3} T^{2/3})$ in the adversarial regime, where $c_{\\mathcal{G}}$ is a game-dependent constant. We also provide regret bounds for a stochastic regime with adversarial corruptions. Our algorithms are based on the follow-the-regularized-leader framework and are inspired by the approach of exploration by optimization and the adaptive learning rate in the field of online learning with feedback graphs.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Algorithmic Learning Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2207.14550","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

This study considers the partial monitoring problem with $k$-actions and $d$-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are favorably bounded both in the stochastic and adversarial regimes. In particular, we show that for non-degenerate locally observable games, the regret is $O(m^2 k^4 \log(T) \log(k_{\Pi} T) / \Delta_{\min})$ in the stochastic regime and $O(m k^{2/3} \sqrt{T \log(T) \log k_{\Pi}})$ in the adversarial regime, where $T$ is the number of rounds, $m$ is the maximum number of distinct observations per action, $\Delta_{\min}$ is the minimum suboptimality gap, and $k_{\Pi}$ is the number of Pareto optimal actions. Moreover, we show that for globally observable games, the regret is $O(c_{\mathcal{G}}^2 \log(T) \log(k_{\Pi} T) / \Delta_{\min}^2)$ in the stochastic regime and $O((c_{\mathcal{G}}^2 \log(T) \log(k_{\Pi} T))^{1/3} T^{2/3})$ in the adversarial regime, where $c_{\mathcal{G}}$ is a game-dependent constant. We also provide regret bounds for a stochastic regime with adversarial corruptions. Our algorithms are based on the follow-the-regularized-leader framework and are inspired by the approach of exploration by optimization and the adaptive learning rate in the field of online learning with feedback graphs.
部分监控的两全其美算法
本研究考虑了$k$ -行动和$d$ -结果的部分监控问题,并提供了第一个两全其天下的最佳算法,其遗憾在随机和对抗状态下都是有利的。特别地,我们证明了对于非退化的局部可观察对策,在随机制度下的后悔是$O(m^2 k^4 \log(T) \log(k_{\Pi} T) / \Delta_{\min})$,在对抗制度下的后悔是$O(m k^{2/3} \sqrt{T \log(T) \log k_{\Pi}})$,其中$T$是回合数,$m$是每个行动的最大不同观察数,$\Delta_{\min}$是最小次优性差距,$k_{\Pi}$是帕累托最优行动的数量。此外,我们表明,对于全局可观察的博弈,遗憾是$O(c_{\mathcal{G}}^2 \log(T) \log(k_{\Pi} T) / \Delta_{\min}^2)$在随机制度和$O((c_{\mathcal{G}}^2 \log(T) \log(k_{\Pi} T))^{1/3} T^{2/3})$在对抗制度,$c_{\mathcal{G}}$是一个游戏相关的常数。我们还为具有对抗性腐败的随机制度提供了遗憾界。我们的算法基于遵循正则化领导者框架,并受到在线学习反馈图领域的优化探索方法和自适应学习率的启发。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信