{"title":"On Computable Online Learning","authors":"Niki Hasrati, S. Ben-David","doi":"10.48550/arXiv.2302.04357","DOIUrl":"https://doi.org/10.48550/arXiv.2302.04357","url":null,"abstract":"We initiate a study of computable online (c-online) learning, which we analyze under varying requirements for\"optimality\"in terms of the mistake bound. Our main contribution is to give a necessary and sufficient condition for optimal c-online learning and show that the Littlestone dimension no longer characterizes the optimal mistake bound of c-online learning. Furthermore, we introduce anytime optimal (a-optimal) online learning, a more natural conceptualization of\"optimality\"and a generalization of Littlestone's Standard Optimal Algorithm. We show the existence of a computational separation between a-optimal and optimal online learning, proving that a-optimal online learning is computationally more difficult. Finally, we consider online learning with no requirements for optimality, and show, under a weaker notion of computability, that the finiteness of the Littlestone dimension no longer characterizes whether a class is c-online learnable with finite mistake bound. A potential avenue for strengthening this result is suggested by exploring the relationship between c-online and CPAC learning, where we show that c-online learning is as difficult as improper CPAC learning.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125877141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SQ Lower Bounds for Random Sparse Planted Vector Problem","authors":"Jingqiu Ding, Yiding Hua","doi":"10.48550/arXiv.2301.11124","DOIUrl":"https://doi.org/10.48550/arXiv.2301.11124","url":null,"abstract":"Consider the setting where a $rho$-sparse Rademacher vector is planted in a random $d$-dimensional subspace of $R^n$. A classical question is how to recover this planted vector given a random basis in this subspace. A recent result by [ZSWB21] showed that the Lattice basis reduction algorithm can recover the planted vector when $ngeq d+1$. Although the algorithm is not expected to tolerate inverse polynomial amount of noise, it is surprising because it was previously shown that recovery cannot be achieved by low degree polynomials when $nll rho^2 d^{2}$ [MW21]. A natural question is whether we can derive an Statistical Query (SQ) lower bound matching the previous low degree lower bound in [MW21]. This will - imply that the SQ lower bound can be surpassed by lattice based algorithms; - predict the computational hardness when the planted vector is perturbed by inverse polynomial amount of noise. In this paper, we prove such an SQ lower bound. In particular, we show that super-polynomial number of VSTAT queries is needed to solve the easier statistical testing problem when $nll rho^2 d^{2}$ and $rhogg frac{1}{sqrt{d}}$. The most notable technique we used to derive the SQ lower bound is the almost equivalence relationship between SQ lower bound and low degree lower bound [BBH+20, MW21].","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116735093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Complexity Analysis of a Countable-armed Bandit Problem","authors":"Anand Kalvit, A. Zeevi","doi":"10.48550/arXiv.2301.07243","DOIUrl":"https://doi.org/10.48550/arXiv.2301.07243","url":null,"abstract":"We consider a stochastic multi-armed bandit (MAB) problem motivated by ``large'' action spaces, and endowed with a population of arms containing exactly $K$ arm-types, each characterized by a distinct mean reward. The decision maker is oblivious to the statistical properties of reward distributions as well as the population-level distribution of different arm-types, and is precluded also from observing the type of an arm after play. We study the classical problem of minimizing the expected cumulative regret over a horizon of play $n$, and propose algorithms that achieve a rate-optimal finite-time instance-dependent regret of $mathcal{O}left( log n right)$. We also show that the instance-independent (minimax) regret is $tilde{mathcal{O}}left( sqrt{n} right)$ when $K=2$. While the order of regret and complexity of the problem suggests a great degree of similarity to the classical MAB problem, properties of the performance bounds and salient aspects of algorithm design are quite distinct from the latter, as are the key primitives that determine complexity along with the analysis tools needed to study them.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128870221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adversarial Online Multi-Task Reinforcement Learning","authors":"Quan Nguyen, Nishant A. Mehta","doi":"10.48550/arXiv.2301.04268","DOIUrl":"https://doi.org/10.48550/arXiv.2301.04268","url":null,"abstract":"We consider the adversarial online multi-task reinforcement learning setting, where in each of $K$ episodes the learner is given an unknown task taken from a finite set of $M$ unknown finite-horizon MDP models. The learner's objective is to minimize its regret with respect to the optimal policy for each task. We assume the MDPs in $mathcal{M}$ are well-separated under a notion of $lambda$-separability, and show that this notion generalizes many task-separability notions from previous works. We prove a minimax lower bound of $Omega(Ksqrt{DSAH})$ on the regret of any learning algorithm and an instance-specific lower bound of $Omega(frac{K}{lambda^2})$ in sample complexity for a class of uniformly-good cluster-then-learn algorithms. We use a novel construction called 2-JAO MDP for proving the instance-specific lower bound. The lower bounds are complemented with a polynomial time algorithm that obtains $tilde{O}(frac{K}{lambda^2})$ sample complexity guarantee for the clustering phase and $tilde{O}(sqrt{MK})$ regret guarantee for the learning phase, indicating that the dependency on $K$ and $frac{1}{lambda^2}$ is tight.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116291437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mahdi Haghifam, Borja Rodr'iguez-G'alvez, R. Thobaben, M. Skoglund, Daniel M. Roy, G. Dziugaite
{"title":"Limitations of Information-Theoretic Generalization Bounds for Gradient Descent Methods in Stochastic Convex Optimization","authors":"Mahdi Haghifam, Borja Rodr'iguez-G'alvez, R. Thobaben, M. Skoglund, Daniel M. Roy, G. Dziugaite","doi":"10.48550/arXiv.2212.13556","DOIUrl":"https://doi.org/10.48550/arXiv.2212.13556","url":null,"abstract":"To date, no\"information-theoretic\"frameworks for reasoning about generalization error have been shown to establish minimax rates for gradient descent in the setting of stochastic convex optimization. In this work, we consider the prospect of establishing such rates via several existing information-theoretic frameworks: input-output mutual information bounds, conditional mutual information bounds and variants, PAC-Bayes bounds, and recent conditional variants thereof. We prove that none of these bounds are able to establish minimax rates. We then consider a common tactic employed in studying gradient methods, whereby the final iterate is corrupted by Gaussian noise, producing a noisy\"surrogate\"algorithm. We prove that minimax rates cannot be established via the analysis of such surrogates. Our results suggest that new ideas are required to analyze gradient descent using information-theoretic techniques.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133903912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variance-Reduced Conservative Policy Iteration","authors":"Naman Agarwal, Brian Bullins, Karan Singh","doi":"10.48550/arXiv.2212.06283","DOIUrl":"https://doi.org/10.48550/arXiv.2212.06283","url":null,"abstract":"We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to the parameter space for policy gradient algorithms, and thus are unaffected by the possibly non-linear or discontinuous parameterization of the policy class. We propose a variance-reduced variant of Conservative Policy Iteration that improves the sample complexity of producing a $varepsilon$-functional local optimum from $O(varepsilon^{-4})$ to $O(varepsilon^{-3})$. Under state-coverage and policy-completeness assumptions, the algorithm enjoys $varepsilon$-global optimality after sampling $O(varepsilon^{-2})$ times, improving upon the previously established $O(varepsilon^{-3})$ sample requirement.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"354 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122798224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zeyu Jia, Randy Jia, Dhruv Madeka, Dean Phillips Foster
{"title":"Linear Reinforcement Learning with Ball Structure Action Space","authors":"Zeyu Jia, Randy Jia, Dhruv Madeka, Dean Phillips Foster","doi":"10.48550/arXiv.2211.07419","DOIUrl":"https://doi.org/10.48550/arXiv.2211.07419","url":null,"abstract":"We study the problem of Reinforcement Learning (RL) with linear function approximation, i.e. assuming the optimal action-value function is linear in a known $d$-dimensional feature mapping. Unfortunately, however, based on only this assumption, the worst case sample complexity has been shown to be exponential, even under a generative model. Instead of making further assumptions on the MDP or value functions, we assume that our action space is such that there always exist playable actions to explore any direction of the feature space. We formalize this assumption as a ``ball structure'' action space, and show that being able to freely explore the feature space allows for efficient RL. In particular, we propose a sample-efficient RL algorithm (BallRL) that learns an $epsilon$-optimal policy using only $tilde{O}left(frac{H^5d^3}{epsilon^3}right)$ number of trajectories.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125941211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Global Planning in Large MDPs via Stochastic Primal-Dual Optimization","authors":"Gergely Neu, Nneka Okolo","doi":"10.48550/arXiv.2210.12057","DOIUrl":"https://doi.org/10.48550/arXiv.2210.12057","url":null,"abstract":"We propose a new stochastic primal-dual optimization algorithm for planning in a large discounted Markov decision process with a generative model and linear function approximation. Assuming that the feature map approximately satisfies standard realizability and Bellman-closedness conditions and also that the feature vectors of all state-action pairs are representable as convex combinations of a small core set of state-action pairs, we show that our method outputs a near-optimal policy after a polynomial number of queries to the generative model. Our method is computationally efficient and comes with the major advantage that it outputs a single softmax policy that is compactly represented by a low-dimensional parameter vector, and does not need to execute computationally expensive local planning subroutines in runtime.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116721604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liyu Chen, Andrea Tirinzoni, Matteo Pirotta, A. Lazaric
{"title":"Reaching Goals is Hard: Settling the Sample Complexity of the Stochastic Shortest Path","authors":"Liyu Chen, Andrea Tirinzoni, Matteo Pirotta, A. Lazaric","doi":"10.48550/arXiv.2210.04946","DOIUrl":"https://doi.org/10.48550/arXiv.2210.04946","url":null,"abstract":"We study the sample complexity of learning an $epsilon$-optimal policy in the Stochastic Shortest Path (SSP) problem. We first derive sample complexity bounds when the learner has access to a generative model. We show that there exists a worst-case SSP instance with $S$ states, $A$ actions, minimum cost $c_{min}$, and maximum expected cost of the optimal policy over all states $B_{star}$, where any algorithm requires at least $Omega(SAB_{star}^3/(c_{min}epsilon^2))$ samples to return an $epsilon$-optimal policy with high probability. Surprisingly, this implies that whenever $c_{min}=0$ an SSP problem may not be learnable, thus revealing that learning in SSPs is strictly harder than in the finite-horizon and discounted settings. We complement this result with lower bounds when prior knowledge of the hitting time of the optimal policy is available and when we restrict optimality by competing against policies with bounded hitting time. Finally, we design an algorithm with matching upper bounds in these cases. This settles the sample complexity of learning $epsilon$-optimal polices in SSP with generative models. We also initiate the study of learning $epsilon$-optimal policies without access to a generative model (i.e., the so-called best-policy identification problem), and show that sample-efficient learning is impossible in general. On the other hand, efficient learning can be made possible if we assume the agent can directly reach the goal state from any state by paying a fixed cost. We then establish the first upper and lower bounds under this assumption. Finally, using similar analytic tools, we prove that horizon-free regret is impossible in SSPs under general costs, resolving an open problem in (Tarbouriech et al., 2021c).","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130666906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fisher information lower bounds for sampling","authors":"Sinho Chewi, P. Gerber, Holden Lee, Chen Lu","doi":"10.48550/arXiv.2210.02482","DOIUrl":"https://doi.org/10.48550/arXiv.2210.02482","url":null,"abstract":"We prove two lower bounds for the complexity of non-log-concave sampling within the framework of Balasubramanian et al. (2022), who introduced the use of Fisher information (FI) bounds as a notion of approximate first-order stationarity in sampling. Our first lower bound shows that averaged LMC is optimal for the regime of large FI by reducing the problem of finding stationary points in non-convex optimization to sampling. Our second lower bound shows that in the regime of small FI, obtaining a FI of at most $varepsilon^2$ from the target distribution requires $text{poly}(1/varepsilon)$ queries, which is surprising as it rules out the existence of high-accuracy algorithms (e.g., algorithms using Metropolis-Hastings filters) in this context.","PeriodicalId":267197,"journal":{"name":"International Conference on Algorithmic Learning Theory","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133028227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}