Adversarial Online Multi-Task Reinforcement Learning

Quan Nguyen, Nishant A. Mehta
{"title":"Adversarial Online Multi-Task Reinforcement Learning","authors":"Quan Nguyen, Nishant A. Mehta","doi":"10.48550/arXiv.2301.04268","DOIUrl":null,"url":null,"abstract":"We consider the adversarial online multi-task reinforcement learning setting, where in each of $K$ episodes the learner is given an unknown task taken from a finite set of $M$ unknown finite-horizon MDP models. The learner's objective is to minimize its regret with respect to the optimal policy for each task. We assume the MDPs in $\\mathcal{M}$ are well-separated under a notion of $\\lambda$-separability, and show that this notion generalizes many task-separability notions from previous works. We prove a minimax lower bound of $\\Omega(K\\sqrt{DSAH})$ on the regret of any learning algorithm and an instance-specific lower bound of $\\Omega(\\frac{K}{\\lambda^2})$ in sample complexity for a class of uniformly-good cluster-then-learn algorithms. We use a novel construction called 2-JAO MDP for proving the instance-specific lower bound. The lower bounds are complemented with a polynomial time algorithm that obtains $\\tilde{O}(\\frac{K}{\\lambda^2})$ sample complexity guarantee for the clustering phase and $\\tilde{O}(\\sqrt{MK})$ regret guarantee for the learning phase, indicating that the dependency on $K$ and $\\frac{1}{\\lambda^2}$ is tight.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Algorithmic Learning Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2301.04268","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We consider the adversarial online multi-task reinforcement learning setting, where in each of $K$ episodes the learner is given an unknown task taken from a finite set of $M$ unknown finite-horizon MDP models. The learner's objective is to minimize its regret with respect to the optimal policy for each task. We assume the MDPs in $\mathcal{M}$ are well-separated under a notion of $\lambda$-separability, and show that this notion generalizes many task-separability notions from previous works. We prove a minimax lower bound of $\Omega(K\sqrt{DSAH})$ on the regret of any learning algorithm and an instance-specific lower bound of $\Omega(\frac{K}{\lambda^2})$ in sample complexity for a class of uniformly-good cluster-then-learn algorithms. We use a novel construction called 2-JAO MDP for proving the instance-specific lower bound. The lower bounds are complemented with a polynomial time algorithm that obtains $\tilde{O}(\frac{K}{\lambda^2})$ sample complexity guarantee for the clustering phase and $\tilde{O}(\sqrt{MK})$ regret guarantee for the learning phase, indicating that the dependency on $K$ and $\frac{1}{\lambda^2}$ is tight.
对抗在线多任务强化学习
我们考虑对抗性在线多任务强化学习设置,其中在每个$K$集中,学习者被赋予一个来自$M$未知有限视界MDP模型的有限集的未知任务。学习者的目标是根据每个任务的最优策略将后悔最小化。我们假设$\mathcal{M}$中的mdp在$\lambda$ -可分离性的概念下被很好地分离,并表明这个概念概括了以前作品中的许多任务可分离性概念。我们证明了任何学习算法的最小-最大下界$\Omega(K\sqrt{DSAH})$和一类一致好的聚类-学习算法的样本复杂度的实例特定下界$\Omega(\frac{K}{\lambda^2})$。我们使用一种称为2-JAO MDP的新构造来证明特定于实例的下界。对下界进行了多项式时间算法的补充,得到了聚类阶段的$\tilde{O}(\frac{K}{\lambda^2})$样本复杂度保证和学习阶段的$\tilde{O}(\sqrt{MK})$遗憾保证,表明对$K$和$\frac{1}{\lambda^2}$的依赖性较强。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信