International Conference on Algorithmic Learning Theory最新文献

筛选
英文 中文
On Computable Online Learning 关于可计算在线学习
International Conference on Algorithmic Learning Theory Pub Date : 2023-02-08 DOI: 10.48550/arXiv.2302.04357
Niki Hasrati, S. Ben-David
{"title":"On Computable Online Learning","authors":"Niki Hasrati, S. Ben-David","doi":"10.48550/arXiv.2302.04357","DOIUrl":"https://doi.org/10.48550/arXiv.2302.04357","url":null,"abstract":"We initiate a study of computable online (c-online) learning, which we analyze under varying requirements for\"optimality\"in terms of the mistake bound. Our main contribution is to give a necessary and sufficient condition for optimal c-online learning and show that the Littlestone dimension no longer characterizes the optimal mistake bound of c-online learning. Furthermore, we introduce anytime optimal (a-optimal) online learning, a more natural conceptualization of\"optimality\"and a generalization of Littlestone's Standard Optimal Algorithm. We show the existence of a computational separation between a-optimal and optimal online learning, proving that a-optimal online learning is computationally more difficult. Finally, we consider online learning with no requirements for optimality, and show, under a weaker notion of computability, that the finiteness of the Littlestone dimension no longer characterizes whether a class is c-online learnable with finite mistake bound. A potential avenue for strengthening this result is suggested by exploring the relationship between c-online and CPAC learning, where we show that c-online learning is as difficult as improper CPAC learning.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125877141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
SQ Lower Bounds for Random Sparse Planted Vector Problem 随机稀疏种植向量问题的SQ下界
International Conference on Algorithmic Learning Theory Pub Date : 2023-01-26 DOI: 10.48550/arXiv.2301.11124
Jingqiu Ding, Yiding Hua
{"title":"SQ Lower Bounds for Random Sparse Planted Vector Problem","authors":"Jingqiu Ding, Yiding Hua","doi":"10.48550/arXiv.2301.11124","DOIUrl":"https://doi.org/10.48550/arXiv.2301.11124","url":null,"abstract":"Consider the setting where a $rho$-sparse Rademacher vector is planted in a random $d$-dimensional subspace of $R^n$. A classical question is how to recover this planted vector given a random basis in this subspace. A recent result by [ZSWB21] showed that the Lattice basis reduction algorithm can recover the planted vector when $ngeq d+1$. Although the algorithm is not expected to tolerate inverse polynomial amount of noise, it is surprising because it was previously shown that recovery cannot be achieved by low degree polynomials when $nll rho^2 d^{2}$ [MW21]. A natural question is whether we can derive an Statistical Query (SQ) lower bound matching the previous low degree lower bound in [MW21]. This will - imply that the SQ lower bound can be surpassed by lattice based algorithms; - predict the computational hardness when the planted vector is perturbed by inverse polynomial amount of noise. In this paper, we prove such an SQ lower bound. In particular, we show that super-polynomial number of VSTAT queries is needed to solve the easier statistical testing problem when $nll rho^2 d^{2}$ and $rhogg frac{1}{sqrt{d}}$. The most notable technique we used to derive the SQ lower bound is the almost equivalence relationship between SQ lower bound and low degree lower bound [BBH+20, MW21].","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116735093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Complexity Analysis of a Countable-armed Bandit Problem 可数武装强盗问题的复杂性分析
International Conference on Algorithmic Learning Theory Pub Date : 2023-01-18 DOI: 10.48550/arXiv.2301.07243
Anand Kalvit, A. Zeevi
{"title":"Complexity Analysis of a Countable-armed Bandit Problem","authors":"Anand Kalvit, A. Zeevi","doi":"10.48550/arXiv.2301.07243","DOIUrl":"https://doi.org/10.48550/arXiv.2301.07243","url":null,"abstract":"We consider a stochastic multi-armed bandit (MAB) problem motivated by ``large'' action spaces, and endowed with a population of arms containing exactly $K$ arm-types, each characterized by a distinct mean reward. The decision maker is oblivious to the statistical properties of reward distributions as well as the population-level distribution of different arm-types, and is precluded also from observing the type of an arm after play. We study the classical problem of minimizing the expected cumulative regret over a horizon of play $n$, and propose algorithms that achieve a rate-optimal finite-time instance-dependent regret of $mathcal{O}left( log n right)$. We also show that the instance-independent (minimax) regret is $tilde{mathcal{O}}left( sqrt{n} right)$ when $K=2$. While the order of regret and complexity of the problem suggests a great degree of similarity to the classical MAB problem, properties of the performance bounds and salient aspects of algorithm design are quite distinct from the latter, as are the key primitives that determine complexity along with the analysis tools needed to study them.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128870221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adversarial Online Multi-Task Reinforcement Learning 对抗在线多任务强化学习
International Conference on Algorithmic Learning Theory Pub Date : 2023-01-11 DOI: 10.48550/arXiv.2301.04268
Quan Nguyen, Nishant A. Mehta
{"title":"Adversarial Online Multi-Task Reinforcement Learning","authors":"Quan Nguyen, Nishant A. Mehta","doi":"10.48550/arXiv.2301.04268","DOIUrl":"https://doi.org/10.48550/arXiv.2301.04268","url":null,"abstract":"We consider the adversarial online multi-task reinforcement learning setting, where in each of $K$ episodes the learner is given an unknown task taken from a finite set of $M$ unknown finite-horizon MDP models. The learner's objective is to minimize its regret with respect to the optimal policy for each task. We assume the MDPs in $mathcal{M}$ are well-separated under a notion of $lambda$-separability, and show that this notion generalizes many task-separability notions from previous works. We prove a minimax lower bound of $Omega(Ksqrt{DSAH})$ on the regret of any learning algorithm and an instance-specific lower bound of $Omega(frac{K}{lambda^2})$ in sample complexity for a class of uniformly-good cluster-then-learn algorithms. We use a novel construction called 2-JAO MDP for proving the instance-specific lower bound. The lower bounds are complemented with a polynomial time algorithm that obtains $tilde{O}(frac{K}{lambda^2})$ sample complexity guarantee for the clustering phase and $tilde{O}(sqrt{MK})$ regret guarantee for the learning phase, indicating that the dependency on $K$ and $frac{1}{lambda^2}$ is tight.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116291437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization 随机凸优化中梯度下降方法的信息论泛化界的局限性
International Conference on Algorithmic Learning Theory Pub Date : 2022-12-27 DOI: 10.48550/arXiv.2212.13556
Mahdi Haghifam, Borja Rodr'iguez-G'alvez, R. Thobaben, M. Skoglund, Daniel M. Roy, G. Dziugaite
{"title":"Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization","authors":"Mahdi Haghifam, Borja Rodr'iguez-G'alvez, R. Thobaben, M. Skoglund, Daniel M. Roy, G. Dziugaite","doi":"10.48550/arXiv.2212.13556","DOIUrl":"https://doi.org/10.48550/arXiv.2212.13556","url":null,"abstract":"To date, no\"information-theoretic\"frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect of establishing such rates via several existing information-theoretic frameworks: input-output mutual information bounds, conditional mutual information bounds and variants, PAC-Bayes bounds, and recent conditional variants thereof. We prove that none of these bounds are able to establish minimax rates. We then consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy\"surrogate\"algorithm. We prove that minimax rates cannot be established via the analysis of such surrogates. Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133903912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Variance-Reduced Conservative Policy Iteration 减方差保守策略迭代
International Conference on Algorithmic Learning Theory Pub Date : 2022-12-12 DOI: 10.48550/arXiv.2212.06283
Naman Agarwal, Brian Bullins, Karan Singh
{"title":"Variance-Reduced Conservative Policy Iteration","authors":"Naman Agarwal, Brian Bullins, Karan Singh","doi":"10.48550/arXiv.2212.06283","DOIUrl":"https://doi.org/10.48550/arXiv.2212.06283","url":null,"abstract":"We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to the parameter space for policy gradient algorithms, and thus are unaffected by the possibly non-linear or discontinuous parameterization of the policy class. We propose a variance-reduced variant of Conservative Policy Iteration that improves the sample complexity of producing a $varepsilon$-functional local optimum from $O(varepsilon^{-4})$ to $O(varepsilon^{-3})$. Under state-coverage and policy-completeness assumptions, the algorithm enjoys $varepsilon$-global optimality after sampling $O(varepsilon^{-2})$ times, improving upon the previously established $O(varepsilon^{-3})$ sample requirement.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"354 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122798224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Linear Reinforcement Learning with Ball Structure Action Space 球结构作用空间的线性强化学习
International Conference on Algorithmic Learning Theory Pub Date : 2022-11-14 DOI: 10.48550/arXiv.2211.07419
Zeyu Jia, Randy Jia, Dhruv Madeka, Dean Phillips Foster
{"title":"Linear Reinforcement Learning with Ball Structure Action Space","authors":"Zeyu Jia, Randy Jia, Dhruv Madeka, Dean Phillips Foster","doi":"10.48550/arXiv.2211.07419","DOIUrl":"https://doi.org/10.48550/arXiv.2211.07419","url":null,"abstract":"We study the problem of Reinforcement Learning (RL) with linear function approximation, i.e. assuming the optimal action-value function is linear in a known $d$-dimensional feature mapping. Unfortunately, however, based on only this assumption, the worst case sample complexity has been shown to be exponential, even under a generative model. Instead of making further assumptions on the MDP or value functions, we assume that our action space is such that there always exist playable actions to explore any direction of the feature space. We formalize this assumption as a ``ball structure'' action space, and show that being able to freely explore the feature space allows for efficient RL. In particular, we propose a sample-efficient RL algorithm (BallRL) that learns an $epsilon$-optimal policy using only $tilde{O}left(frac{H^5d^3}{epsilon^3}right)$ number of trajectories.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125941211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Efficient Global Planning in Large MDPs via Stochastic Primal-Dual Optimization 基于随机原对偶优化的大型mdp高效全局规划
International Conference on Algorithmic Learning Theory Pub Date : 2022-10-21 DOI: 10.48550/arXiv.2210.12057
Gergely Neu, Nneka Okolo
{"title":"Efficient Global Planning in Large MDPs via Stochastic Primal-Dual Optimization","authors":"Gergely Neu, Nneka Okolo","doi":"10.48550/arXiv.2210.12057","DOIUrl":"https://doi.org/10.48550/arXiv.2210.12057","url":null,"abstract":"We propose a new stochastic primal-dual optimization algorithm for planning in a large discounted Markov decision process with a generative model and linear function approximation. Assuming that the feature map approximately satisfies standard realizability and Bellman-closedness conditions and also that the feature vectors of all state-action pairs are representable as convex combinations of a small core set of state-action pairs, we show that our method outputs a near-optimal policy after a polynomial number of queries to the generative model. Our method is computationally efficient and comes with the major advantage that it outputs a single softmax policy that is compactly represented by a low-dimensional parameter vector, and does not need to execute computationally expensive local planning subroutines in runtime.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116721604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Reaching Goals is Hard: Settling the Sample Complexity of the Stochastic Shortest Path 达到目标是困难的:解决随机最短路径的样本复杂度
International Conference on Algorithmic Learning Theory Pub Date : 2022-10-10 DOI: 10.48550/arXiv.2210.04946
Liyu Chen, Andrea Tirinzoni, Matteo Pirotta, A. Lazaric
{"title":"Reaching Goals is Hard: Settling the Sample Complexity of the Stochastic Shortest Path","authors":"Liyu Chen, Andrea Tirinzoni, Matteo Pirotta, A. Lazaric","doi":"10.48550/arXiv.2210.04946","DOIUrl":"https://doi.org/10.48550/arXiv.2210.04946","url":null,"abstract":"We study the sample complexity of learning an $epsilon$-optimal policy in the Stochastic Shortest Path (SSP) problem. We first derive sample complexity bounds when the learner has access to a generative model. We show that there exists a worst-case SSP instance with $S$ states, $A$ actions, minimum cost $c_{min}$, and maximum expected cost of the optimal policy over all states $B_{star}$, where any algorithm requires at least $Omega(SAB_{star}^3/(c_{min}epsilon^2))$ samples to return an $epsilon$-optimal policy with high probability. Surprisingly, this implies that whenever $c_{min}=0$ an SSP problem may not be learnable, thus revealing that learning in SSPs is strictly harder than in the finite-horizon and discounted settings. We complement this result with lower bounds when prior knowledge of the hitting time of the optimal policy is available and when we restrict optimality by competing against policies with bounded hitting time. Finally, we design an algorithm with matching upper bounds in these cases. This settles the sample complexity of learning $epsilon$-optimal polices in SSP with generative models. We also initiate the study of learning $epsilon$-optimal policies without access to a generative model (i.e., the so-called best-policy identification problem), and show that sample-efficient learning is impossible in general. On the other hand, efficient learning can be made possible if we assume the agent can directly reach the goal state from any state by paying a fixed cost. We then establish the first upper and lower bounds under this assumption. Finally, using similar analytic tools, we prove that horizon-free regret is impossible in SSPs under general costs, resolving an open problem in (Tarbouriech et al., 2021c).","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130666906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Fisher information lower bounds for sampling 抽样的费雪信息下界
International Conference on Algorithmic Learning Theory Pub Date : 2022-10-05 DOI: 10.48550/arXiv.2210.02482
Sinho Chewi, P. Gerber, Holden Lee, Chen Lu
{"title":"Fisher information lower bounds for sampling","authors":"Sinho Chewi, P. Gerber, Holden Lee, Chen Lu","doi":"10.48550/arXiv.2210.02482","DOIUrl":"https://doi.org/10.48550/arXiv.2210.02482","url":null,"abstract":"We prove two lower bounds for the complexity of non-log-concave sampling within the framework of Balasubramanian et al. (2022), who introduced the use of Fisher information (FI) bounds as a notion of approximate first-order stationarity in sampling. Our first lower bound shows that averaged LMC is optimal for the regime of large FI by reducing the problem of finding stationary points in non-convex optimization to sampling. Our second lower bound shows that in the regime of small FI, obtaining a FI of at most $varepsilon^2$ from the target distribution requires $text{poly}(1/varepsilon)$ queries, which is surprising as it rules out the existence of high-accuracy algorithms (e.g., algorithms using Metropolis-Hastings filters) in this context.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133028227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信