International Conference on Algorithmic Learning Theory最新文献

筛选
英文 中文
Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs 改进的具有时变反馈图的对抗性强盗的高概率后悔
International Conference on Algorithmic Learning Theory Pub Date : 2022-10-04 DOI: 10.48550/arXiv.2210.01376
Haipeng Luo, Hanghang Tong, Mengxiao Zhang, Yuheng Zhang
{"title":"Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs","authors":"Haipeng Luo, Hanghang Tong, Mengxiao Zhang, Yuheng Zhang","doi":"10.48550/arXiv.2210.01376","DOIUrl":"https://doi.org/10.48550/arXiv.2210.01376","url":null,"abstract":"We study high-probability regret bounds for adversarial $K$-armed bandits with time-varying feedback graphs over $T$ rounds. For general strongly observable graphs, we develop an algorithm that achieves the optimal regret $widetilde{mathcal{O}}((sum_{t=1}^Talpha_t)^{1/2}+max_{tin[T]}alpha_t)$ with high probability, where $alpha_t$ is the independence number of the feedback graph at round $t$. Compared to the best existing result [Neu, 2015] which only considers graphs with self-loops for all nodes, our result not only holds more generally, but importantly also removes any $text{poly}(K)$ dependence that can be prohibitively large for applications such as contextual bandits. Furthermore, we also develop the first algorithm that achieves the optimal high-probability regret bound for weakly observable graphs, which even improves the best expected regret bound of [Alon et al., 2015] by removing the $mathcal{O}(sqrt{KT})$ term with a refined analysis. Our algorithms are based on the online mirror descent framework, but importantly with an innovative combination of several techniques. Notably, while earlier works use optimistic biased loss estimators for achieving high-probability bounds, we find it important to use a pessimistic one for nodes without self-loop in a strongly observable graph.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127224549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Max-Quantile Grouped Infinite-Arm Bandits 最大分位数分组无限臂强盗
International Conference on Algorithmic Learning Theory Pub Date : 2022-10-04 DOI: 10.48550/arXiv.2210.01295
Ivan Lau, Yan Hao Ling, Mayank Shrivastava, J. Scarlett
{"title":"Max-Quantile Grouped Infinite-Arm Bandits","authors":"Ivan Lau, Yan Hao Ling, Mayank Shrivastava, J. Scarlett","doi":"10.48550/arXiv.2210.01295","DOIUrl":"https://doi.org/10.48550/arXiv.2210.01295","url":null,"abstract":"In this paper, we consider a bandit problem in which there are a number of groups each consisting of infinitely many arms. Whenever a new arm is requested from a given group, its mean reward is drawn from an unknown reservoir distribution (different for each group), and the uncertainty in the arm's mean reward can only be reduced via subsequent pulls of the arm. The goal is to identify the infinite-arm group whose reservoir distribution has the highest $(1-alpha)$-quantile (e.g., median if $alpha = frac{1}{2}$), using as few total arm pulls as possible. We introduce a two-step algorithm that first requests a fixed number of arms from each group and then runs a finite-arm grouped max-quantile bandit algorithm. We characterize both the instance-dependent and worst-case regret, and provide a matching lower bound for the latter, while discussing various strengths, weaknesses, algorithmic improvements, and potential lower bounds associated with our instance-dependent upper bounds.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125576324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online Self-Concordant and Relatively Smooth Minimization, With Applications to Online Portfolio Selection and Learning Quantum States 在线自协调和相对平滑最小化,及其在在线投资组合选择和量子态学习中的应用
International Conference on Algorithmic Learning Theory Pub Date : 2022-10-03 DOI: 10.48550/arXiv.2210.00997
C. Tsai, Hao-Chung Cheng, Yen-Huan Li
{"title":"Online Self-Concordant and Relatively Smooth Minimization, With Applications to Online Portfolio Selection and Learning Quantum States","authors":"C. Tsai, Hao-Chung Cheng, Yen-Huan Li","doi":"10.48550/arXiv.2210.00997","DOIUrl":"https://doi.org/10.48550/arXiv.2210.00997","url":null,"abstract":"Consider an online convex optimization problem where the loss functions are self-concordant barriers, smooth relative to a convex function $h$, and possibly non-Lipschitz. We analyze the regret of online mirror descent with $h$. Then, based on the result, we prove the following in a unified manner. Denote by $T$ the time horizon and $d$ the parameter dimension. 1. For online portfolio selection, the regret of $widetilde{text{EG}}$, a variant of exponentiated gradient due to Helmbold et al., is $tilde{O} ( T^{2/3} d^{1/3} )$ when $T>4 d / log d$. This improves on the original $tilde{O} ( T^{3/4} d^{1/2} )$ regret bound for $widetilde{text{EG}}$. 2. For online portfolio selection, the regret of online mirror descent with the logarithmic barrier is $tilde{O}(sqrt{T d})$. The regret bound is the same as that of Soft-Bayes due to Orseau et al. up to logarithmic terms. 3. For online learning quantum states with the logarithmic loss, the regret of online mirror descent with the log-determinant function is also $tilde{O} ( sqrt{T d} )$. Its per-iteration time is shorter than all existing algorithms we know.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126500943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Robust Empirical Risk Minimization with Tolerance 具有容忍度的稳健经验风险最小化
International Conference on Algorithmic Learning Theory Pub Date : 2022-10-02 DOI: 10.48550/arXiv.2210.00635
Robi Bhattacharjee, Max Hopkins, Akash Kumar, Hantao Yu, Kamalika Chaudhuri
{"title":"Robust Empirical Risk Minimization with Tolerance","authors":"Robi Bhattacharjee, Max Hopkins, Akash Kumar, Hantao Yu, Kamalika Chaudhuri","doi":"10.48550/arXiv.2210.00635","DOIUrl":"https://doi.org/10.48550/arXiv.2210.00635","url":null,"abstract":"Developing simple, sample-efficient learning algorithms for robust classification is a pressing issue in today's tech-dominated world, and current theoretical techniques requiring exponential sample complexity and complicated improper learning rules fall far from answering the need. In this work we study the fundamental paradigm of (robust) $textit{empirical risk minimization}$ (RERM), a simple process in which the learner outputs any hypothesis minimizing its training error. RERM famously fails to robustly learn VC classes (Montasser et al., 2019a), a bound we show extends even to `nice' settings such as (bounded) halfspaces. As such, we study a recent relaxation of the robust model called $textit{tolerant}$ robust learning (Ashtiani et al., 2022) where the output classifier is compared to the best achievable error over slightly larger perturbation sets. We show that under geometric niceness conditions, a natural tolerant variant of RERM is indeed sufficient for $gamma$-tolerant robust learning VC classes over $mathbb{R}^d$, and requires only $tilde{O}left( frac{VC(H)dlog frac{D}{gammadelta}}{epsilon^2}right)$ samples for robustness regions of (maximum) diameter $D$.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"591 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116554210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
The Replicator Dynamic, Chain Components and the Response Graph 复制器动态,链式组件和响应图
International Conference on Algorithmic Learning Theory Pub Date : 2022-09-30 DOI: 10.48550/arXiv.2209.15230
O. Biggar, I. Shames
{"title":"The Replicator Dynamic, Chain Components and the Response Graph","authors":"O. Biggar, I. Shames","doi":"10.48550/arXiv.2209.15230","DOIUrl":"https://doi.org/10.48550/arXiv.2209.15230","url":null,"abstract":"In this paper we examine the relationship between the flow of the replicator dynamic, the continuum limit of Multiplicative Weights Update, and a game's response graph. We settle an open problem establishing that under the replicator, sink chain components -- a topological notion of long-run outcome of a dynamical system -- always exist and are approximated by the sink connected components of the game's response graph. More specifically, each sink chain component contains a sink connected component of the response graph, as well as all mixed strategy profiles whose support consists of pure profiles in the same connected component, a set we call the content of the connected component. As a corollary, all profiles are chain recurrent in games with strongly connected response graphs. In any two-player game sharing a response graph with a zero-sum game, the sink chain component is unique. In two-player zero-sum and potential games the sink chain components and sink connected components are in a one-to-one correspondence, and we conjecture that this holds in all games.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131018955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
On The Computational Complexity of Self-Attention 关于自我注意的计算复杂性
International Conference on Algorithmic Learning Theory Pub Date : 2022-09-11 DOI: 10.48550/arXiv.2209.04881
Feyza Duman Keles, Pruthuvi Maheshakya Wijewardena, C. Hegde
{"title":"On The Computational Complexity of Self-Attention","authors":"Feyza Duman Keles, Pruthuvi Maheshakya Wijewardena, C. Hegde","doi":"10.48550/arXiv.2209.04881","DOIUrl":"https://doi.org/10.48550/arXiv.2209.04881","url":null,"abstract":"Transformer architectures have led to remarkable progress in many state-of-art applications. However, despite their successes, modern transformers rely on the self-attention mechanism, whose time- and space-complexity is quadratic in the length of the input. Several approaches have been proposed to speed up self-attention mechanisms to achieve sub-quadratic running time; however, the large majority of these works are not accompanied by rigorous error guarantees. In this work, we establish lower bounds on the computational complexity of self-attention in a number of scenarios. We prove that the time complexity of self-attention is necessarily quadratic in the input length, unless the Strong Exponential Time Hypothesis (SETH) is false. This argument holds even if the attention computation is performed only approximately, and for a variety of attention mechanisms. As a complement to our lower bounds, we show that it is indeed possible to approximate dot-product self-attention using finite Taylor series in linear-time, at the cost of having an exponential dependence on the polynomial order.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115479352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Best-of-Both-Worlds Algorithms for Partial Monitoring 部分监控的两全其美算法
International Conference on Algorithmic Learning Theory Pub Date : 2022-07-29 DOI: 10.48550/arXiv.2207.14550
Taira Tsuchiya, Shinji Ito, J. Honda
{"title":"Best-of-Both-Worlds Algorithms for Partial Monitoring","authors":"Taira Tsuchiya, Shinji Ito, J. Honda","doi":"10.48550/arXiv.2207.14550","DOIUrl":"https://doi.org/10.48550/arXiv.2207.14550","url":null,"abstract":"This study considers the partial monitoring problem with $k$-actions and $d$-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are favorably bounded both in the stochastic and adversarial regimes. In particular, we show that for non-degenerate locally observable games, the regret is $O(m^2 k^4 log(T) log(k_{Pi} T) / Delta_{min})$ in the stochastic regime and $O(m k^{2/3} sqrt{T log(T) log k_{Pi}})$ in the adversarial regime, where $T$ is the number of rounds, $m$ is the maximum number of distinct observations per action, $Delta_{min}$ is the minimum suboptimality gap, and $k_{Pi}$ is the number of Pareto optimal actions. Moreover, we show that for globally observable games, the regret is $O(c_{mathcal{G}}^2 log(T) log(k_{Pi} T) / Delta_{min}^2)$ in the stochastic regime and $O((c_{mathcal{G}}^2 log(T) log(k_{Pi} T))^{1/3} T^{2/3})$ in the adversarial regime, where $c_{mathcal{G}}$ is a game-dependent constant. We also provide regret bounds for a stochastic regime with adversarial corruptions. Our algorithms are based on the follow-the-regularized-leader framework and are inspired by the approach of exploration by optimization and the adaptive learning rate in the field of online learning with feedback graphs.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121399626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Online Learning with Off-Policy Feedback 在线学习与非政策反馈
International Conference on Algorithmic Learning Theory Pub Date : 2022-07-18 DOI: 10.48550/arXiv.2207.08956
Germano Gabbianelli, M. Papini, Gergely Neu
{"title":"Online Learning with Off-Policy Feedback","authors":"Germano Gabbianelli, M. Papini, Gergely Neu","doi":"10.48550/arXiv.2207.08956","DOIUrl":"https://doi.org/10.48550/arXiv.2207.08956","url":null,"abstract":"We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback. In this sequential decision making problem, the learner cannot directly observe its rewards, but instead sees the ones obtained by another unknown policy run in parallel (behavior policy). Instead of a standard exploration-exploitation dilemma, the learner has to face another challenge in this setting: due to limited observations outside of their control, the learner may not be able to estimate the value of each policy equally well. To address this issue, we propose a set of algorithms that guarantee regret bounds that scale with a natural notion of mismatch between any comparator policy and the behavior policy, achieving improved performance against comparators that are well-covered by the observations. We also provide an extension to the setting of adversarial linear contextual bandits, and verify the theoretical guarantees via a set of experiments. Our key algorithmic idea is adapting the notion of pessimistic reward estimators that has been recently popular in the context of off-policy reinforcement learning.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129872453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Optimistic PAC Reinforcement Learning: the Instance-Dependent View 乐观PAC强化学习:实例依赖的观点
International Conference on Algorithmic Learning Theory Pub Date : 2022-07-12 DOI: 10.48550/arXiv.2207.05852
Andrea Tirinzoni, Aymen Al Marjani, E. Kaufmann
{"title":"Optimistic PAC Reinforcement Learning: the Instance-Dependent View","authors":"Andrea Tirinzoni, Aymen Al Marjani, E. Kaufmann","doi":"10.48550/arXiv.2207.05852","DOIUrl":"https://doi.org/10.48550/arXiv.2207.05852","url":null,"abstract":"Optimistic algorithms have been extensively studied for regret minimization in episodic tabular MDPs, both from a minimax and an instance-dependent view. However, for the PAC RL problem, where the goal is to identify a near-optimal policy with high probability, little is known about their instance-dependent sample complexity. A negative result of Wagenmaker et al. (2021) suggests that optimistic sampling rules cannot be used to attain the (still elusive) optimal instance-dependent sample complexity. On the positive side, we provide the first instance-dependent bound for an optimistic algorithm for PAC RL, BPI-UCRL, for which only minimax guarantees were available (Kaufmann et al., 2021). While our bound features some minimal visitation probabilities, it also features a refined notion of sub-optimality gap compared to the value gaps that appear in prior work. Moreover, in MDPs with deterministic transitions, we show that BPI-UCRL is actually near-optimal. On the technical side, our analysis is very simple thanks to a new\"target trick\"of independent interest. We complement these findings with a novel hardness result explaining why the instance-dependent complexity of PAC RL cannot be easily related to that of regret minimization, unlike in the minimax regime.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131658620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Reconstructing Ultrametric Trees from Noisy Experiments 基于噪声实验的超声树重建
International Conference on Algorithmic Learning Theory Pub Date : 2022-06-15 DOI: 10.48550/arXiv.2206.07672
Eshwar Ram Arunachaleswaran, Anindya De, Sampath Kannan
{"title":"Reconstructing Ultrametric Trees from Noisy Experiments","authors":"Eshwar Ram Arunachaleswaran, Anindya De, Sampath Kannan","doi":"10.48550/arXiv.2206.07672","DOIUrl":"https://doi.org/10.48550/arXiv.2206.07672","url":null,"abstract":"The problem of reconstructing evolutionary trees or phylogenies is of great interest in computational biology. A popular model for this problem assumes that we are given the set of leaves (current species) of an unknown binary tree and the results of `experiments' on triples of leaves (a,b,c), which return the pair with the deepest least common ancestor. If the tree is assumed to be an ultrametric (i.e., all root-leaf paths have the same length), the experiment can be equivalently seen to return the closest pair of leaves. In this model, efficient algorithms are known for tree reconstruction. In reality, since the data on which these `experiments' are run is itself generated by the stochastic process of evolution, these experiments are noisy. In all reasonable models of evolution, if the branches leading to the leaves in a triple separate from each other at common ancestors that are very close to each other in the tree, the result of the experiment should be close to uniformly random. Motivated by this, we consider a model where the noise on any triple is just dependent on the three pairwise distances (referred to as distance based noise). Our results are the following: 1. Suppose the length of every edge in the unknown tree is at least $tilde{O}(frac{1}{sqrt n})$ fraction of the length of a root-leaf path. Then, we give an efficient algorithm to reconstruct the topology of the tree for a broad family of distance-based noise models. Further, we show that if the edges are asymptotically shorter, then topology reconstruction is information-theoretically impossible. 2. Further, for a specific distance-based noise model--which we refer to as the homogeneous noise model--we show that the edge weights can also be approximately reconstructed under the same quantitative lower bound on the edge lengths.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128591520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信