International Conference on Algorithmic Learning Theory最新文献_第2页

Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs 改进的具有时变反馈图的对抗性强盗的高概率后悔

International Conference on Algorithmic Learning Theory Pub Date : 2022-10-04 DOI: 10.48550/arXiv.2210.01376

Haipeng Luo, Hanghang Tong, Mengxiao Zhang, Yuheng Zhang

{"title":"Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs","authors":"Haipeng Luo, Hanghang Tong, Mengxiao Zhang, Yuheng Zhang","doi":"10.48550/arXiv.2210.01376","DOIUrl":"https://doi.org/10.48550/arXiv.2210.01376","url":null,"abstract":"We study high-probability regret bounds for adversarial $K$-armed bandits with time-varying feedback graphs over $T$ rounds. For general strongly observable graphs, we develop an algorithm that achieves the optimal regret $widetilde{mathcal{O}}((sum_{t=1}^Talpha_t)^{1/2}+max_{tin[T]}alpha_t)$ with high probability, where $alpha_t$ is the independence number of the feedback graph at round $t$. Compared to the best existing result [Neu, 2015] which only considers graphs with self-loops for all nodes, our result not only holds more generally, but importantly also removes any $text{poly}(K)$ dependence that can be prohibitively large for applications such as contextual bandits. Furthermore, we also develop the first algorithm that achieves the optimal high-probability regret bound for weakly observable graphs, which even improves the best expected regret bound of [Alon et al., 2015] by removing the $mathcal{O}(sqrt{KT})$ term with a refined analysis. Our algorithms are based on the online mirror descent framework, but importantly with an innovative combination of several techniques. Notably, while earlier works use optimistic biased loss estimators for achieving high-probability bounds, we find it important to use a pessimistic one for nodes without self-loop in a strongly observable graph.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127224549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Max-Quantile Grouped Infinite-Arm Bandits 最大分位数分组无限臂强盗

International Conference on Algorithmic Learning Theory Pub Date : 2022-10-04 DOI: 10.48550/arXiv.2210.01295

Ivan Lau, Yan Hao Ling, Mayank Shrivastava, J. Scarlett

引用次数: 0

Online Self-Concordant and Relatively Smooth Minimization, With Applications to Online Portfolio Selection and Learning Quantum States 在线自协调和相对平滑最小化，及其在在线投资组合选择和量子态学习中的应用

International Conference on Algorithmic Learning Theory Pub Date : 2022-10-03 DOI: 10.48550/arXiv.2210.00997

C. Tsai, Hao-Chung Cheng, Yen-Huan Li

{"title":"Online Self-Concordant and Relatively Smooth Minimization, With Applications to Online Portfolio Selection and Learning Quantum States","authors":"C. Tsai, Hao-Chung Cheng, Yen-Huan Li","doi":"10.48550/arXiv.2210.00997","DOIUrl":"https://doi.org/10.48550/arXiv.2210.00997","url":null,"abstract":"Consider an online convex optimization problem where the loss functions are self-concordant barriers, smooth relative to a convex function $h$, and possibly non-Lipschitz. We analyze the regret of online mirror descent with $h$. Then, based on the result, we prove the following in a unified manner. Denote by $T$ the time horizon and $d$ the parameter dimension. 1. For online portfolio selection, the regret of $widetilde{text{EG}}$, a variant of exponentiated gradient due to Helmbold et al., is $tilde{O} ( T^{2/3} d^{1/3} )$ when $T>4 d / log d$. This improves on the original $tilde{O} ( T^{3/4} d^{1/2} )$ regret bound for $widetilde{text{EG}}$. 2. For online portfolio selection, the regret of online mirror descent with the logarithmic barrier is $tilde{O}(sqrt{T d})$. The regret bound is the same as that of Soft-Bayes due to Orseau et al. up to logarithmic terms. 3. For online learning quantum states with the logarithmic loss, the regret of online mirror descent with the log-determinant function is also $tilde{O} ( sqrt{T d} )$. Its per-iteration time is shorter than all existing algorithms we know.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126500943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Robust Empirical Risk Minimization with Tolerance 具有容忍度的稳健经验风险最小化

International Conference on Algorithmic Learning Theory Pub Date : 2022-10-02 DOI: 10.48550/arXiv.2210.00635

Robi Bhattacharjee, Max Hopkins, Akash Kumar, Hantao Yu, Kamalika Chaudhuri

{"title":"Robust Empirical Risk Minimization with Tolerance","authors":"Robi Bhattacharjee, Max Hopkins, Akash Kumar, Hantao Yu, Kamalika Chaudhuri","doi":"10.48550/arXiv.2210.00635","DOIUrl":"https://doi.org/10.48550/arXiv.2210.00635","url":null,"abstract":"Developing simple, sample-efficient learning algorithms for robust classification is a pressing issue in today's tech-dominated world, and current theoretical techniques requiring exponential sample complexity and complicated improper learning rules fall far from answering the need. In this work we study the fundamental paradigm of (robust) $textit{empirical risk minimization}$ (RERM), a simple process in which the learner outputs any hypothesis minimizing its training error. RERM famously fails to robustly learn VC classes (Montasser et al., 2019a), a bound we show extends even to `nice' settings such as (bounded) halfspaces. As such, we study a recent relaxation of the robust model called $textit{tolerant}$ robust learning (Ashtiani et al., 2022) where the output classifier is compared to the best achievable error over slightly larger perturbation sets. We show that under geometric niceness conditions, a natural tolerant variant of RERM is indeed sufficient for $gamma$-tolerant robust learning VC classes over $mathbb{R}^d$, and requires only $tilde{O}left( frac{VC(H)dlog frac{D}{gammadelta}}{epsilon^2}right)$ samples for robustness regions of (maximum) diameter $D$.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"591 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116554210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

The Replicator Dynamic, Chain Components and the Response Graph 复制器动态，链式组件和响应图

International Conference on Algorithmic Learning Theory Pub Date : 2022-09-30 DOI: 10.48550/arXiv.2209.15230

O. Biggar, I. Shames

引用次数: 2

On The Computational Complexity of Self-Attention 关于自我注意的计算复杂性

International Conference on Algorithmic Learning Theory Pub Date : 2022-09-11 DOI: 10.48550/arXiv.2209.04881

Feyza Duman Keles, Pruthuvi Maheshakya Wijewardena, C. Hegde

引用次数: 19

Best-of-Both-Worlds Algorithms for Partial Monitoring 部分监控的两全其美算法

International Conference on Algorithmic Learning Theory Pub Date : 2022-07-29 DOI: 10.48550/arXiv.2207.14550

Taira Tsuchiya, Shinji Ito, J. Honda

{"title":"Best-of-Both-Worlds Algorithms for Partial Monitoring","authors":"Taira Tsuchiya, Shinji Ito, J. Honda","doi":"10.48550/arXiv.2207.14550","DOIUrl":"https://doi.org/10.48550/arXiv.2207.14550","url":null,"abstract":"This study considers the partial monitoring problem with $k$-actions and $d$-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are favorably bounded both in the stochastic and adversarial regimes. In particular, we show that for non-degenerate locally observable games, the regret is $O(m^2 k^4 log(T) log(k_{Pi} T) / Delta_{min})$ in the stochastic regime and $O(m k^{2/3} sqrt{T log(T) log k_{Pi}})$ in the adversarial regime, where $T$ is the number of rounds, $m$ is the maximum number of distinct observations per action, $Delta_{min}$ is the minimum suboptimality gap, and $k_{Pi}$ is the number of Pareto optimal actions. Moreover, we show that for globally observable games, the regret is $O(c_{mathcal{G}}^2 log(T) log(k_{Pi} T) / Delta_{min}^2)$ in the stochastic regime and $O((c_{mathcal{G}}^2 log(T) log(k_{Pi} T))^{1/3} T^{2/3})$ in the adversarial regime, where $c_{mathcal{G}}$ is a game-dependent constant. We also provide regret bounds for a stochastic regime with adversarial corruptions. Our algorithms are based on the follow-the-regularized-leader framework and are inspired by the approach of exploration by optimization and the adaptive learning rate in the field of online learning with feedback graphs.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121399626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Online Learning with Off-Policy Feedback 在线学习与非政策反馈

International Conference on Algorithmic Learning Theory Pub Date : 2022-07-18 DOI: 10.48550/arXiv.2207.08956

Germano Gabbianelli, M. Papini, Gergely Neu

引用次数: 3

Optimistic PAC Reinforcement Learning: the Instance-Dependent View 乐观PAC强化学习:实例依赖的观点

International Conference on Algorithmic Learning Theory Pub Date : 2022-07-12 DOI: 10.48550/arXiv.2207.05852

Andrea Tirinzoni, Aymen Al Marjani, E. Kaufmann

{"title":"Optimistic PAC Reinforcement Learning: the Instance-Dependent View","authors":"Andrea Tirinzoni, Aymen Al Marjani, E. Kaufmann","doi":"10.48550/arXiv.2207.05852","DOIUrl":"https://doi.org/10.48550/arXiv.2207.05852","url":null,"abstract":"Optimistic algorithms have been extensively studied for regret minimization in episodic tabular MDPs, both from a minimax and an instance-dependent view. However, for the PAC RL problem, where the goal is to identify a near-optimal policy with high probability, little is known about their instance-dependent sample complexity. A negative result of Wagenmaker et al. (2021) suggests that optimistic sampling rules cannot be used to attain the (still elusive) optimal instance-dependent sample complexity. On the positive side, we provide the first instance-dependent bound for an optimistic algorithm for PAC RL, BPI-UCRL, for which only minimax guarantees were available (Kaufmann et al., 2021). While our bound features some minimal visitation probabilities, it also features a refined notion of sub-optimality gap compared to the value gaps that appear in prior work. Moreover, in MDPs with deterministic transitions, we show that BPI-UCRL is actually near-optimal. On the technical side, our analysis is very simple thanks to a new\"target trick\"of independent interest. We complement these findings with a novel hardness result explaining why the instance-dependent complexity of PAC RL cannot be easily related to that of regret minimization, unlike in the minimax regime.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131658620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Reconstructing Ultrametric Trees from Noisy Experiments 基于噪声实验的超声树重建

International Conference on Algorithmic Learning Theory Pub Date : 2022-06-15 DOI: 10.48550/arXiv.2206.07672

Eshwar Ram Arunachaleswaran, Anindya De, Sampath Kannan

{"title":"Reconstructing Ultrametric Trees from Noisy Experiments","authors":"Eshwar Ram Arunachaleswaran, Anindya De, Sampath Kannan","doi":"10.48550/arXiv.2206.07672","DOIUrl":"https://doi.org/10.48550/arXiv.2206.07672","url":null,"abstract":"The problem of reconstructing evolutionary trees or phylogenies is of great interest in computational biology. A popular model for this problem assumes that we are given the set of leaves (current species) of an unknown binary tree and the results of `experiments' on triples of leaves (a,b,c), which return the pair with the deepest least common ancestor. If the tree is assumed to be an ultrametric (i.e., all root-leaf paths have the same length), the experiment can be equivalently seen to return the closest pair of leaves. In this model, efficient algorithms are known for tree reconstruction. In reality, since the data on which these `experiments' are run is itself generated by the stochastic process of evolution, these experiments are noisy. In all reasonable models of evolution, if the branches leading to the leaves in a triple separate from each other at common ancestors that are very close to each other in the tree, the result of the experiment should be close to uniformly random. Motivated by this, we consider a model where the noise on any triple is just dependent on the three pairwise distances (referred to as distance based noise). Our results are the following: 1. Suppose the length of every edge in the unknown tree is at least $tilde{O}(frac{1}{sqrt n})$ fraction of the length of a root-leaf path. Then, we give an efficient algorithm to reconstruct the topology of the tree for a broad family of distance-based noise models. Further, we show that if the edges are asymptotically shorter, then topology reconstruction is information-theoretically impossible. 2. Further, for a specific distance-based noise model--which we refer to as the homogeneous noise model--we show that the edge weights can also be approximately reconstructed under the same quantitative lower bound on the edge lengths.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128591520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0