{"title":"Accelerated Multi-Time-Scale Stochastic Approximation: Optimal Complexity and Applications in Reinforcement Learning and Multi-Agent Games","authors":"Sihan Zeng, Thinh T. Doan","doi":"arxiv-2409.07767","DOIUrl":null,"url":null,"abstract":"Multi-time-scale stochastic approximation is an iterative algorithm for\nfinding the fixed point of a set of $N$ coupled operators given their noisy\nsamples. It has been observed that due to the coupling between the decision\nvariables and noisy samples of the operators, the performance of this method\ndecays as $N$ increases. In this work, we develop a new accelerated variant of\nmulti-time-scale stochastic approximation, which significantly improves the\nconvergence rates of its standard counterpart. Our key idea is to introduce\nauxiliary variables to dynamically estimate the operators from their samples,\nwhich are then used to update the decision variables. These auxiliary variables\nhelp not only to control the variance of the operator estimates but also to\ndecouple the sampling noise and the decision variables. This allows us to\nselect more aggressive step sizes to achieve an optimal convergence rate.\nSpecifically, under a strong monotonicity condition, we show that for any value\nof $N$ the $t^{\\text{th}}$ iterate of the proposed algorithm converges to the\ndesired solution at a rate $\\widetilde{O}(1/t)$ when the operator samples are\ngenerated from a single from Markov process trajectory. A second contribution of this work is to demonstrate that the objective of a\nrange of problems in reinforcement learning and multi-agent games can be\nexpressed as a system of fixed-point equations. As such, the proposed approach\ncan be used to design new learning algorithms for solving these problems. We\nillustrate this observation with numerical simulations in a multi-agent game\nand show the advantage of the proposed method over the standard\nmulti-time-scale stochastic approximation algorithm.","PeriodicalId":501286,"journal":{"name":"arXiv - MATH - Optimization and Control","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Optimization and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07767","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Multi-time-scale stochastic approximation is an iterative algorithm for
finding the fixed point of a set of $N$ coupled operators given their noisy
samples. It has been observed that due to the coupling between the decision
variables and noisy samples of the operators, the performance of this method
decays as $N$ increases. In this work, we develop a new accelerated variant of
multi-time-scale stochastic approximation, which significantly improves the
convergence rates of its standard counterpart. Our key idea is to introduce
auxiliary variables to dynamically estimate the operators from their samples,
which are then used to update the decision variables. These auxiliary variables
help not only to control the variance of the operator estimates but also to
decouple the sampling noise and the decision variables. This allows us to
select more aggressive step sizes to achieve an optimal convergence rate.
Specifically, under a strong monotonicity condition, we show that for any value
of $N$ the $t^{\text{th}}$ iterate of the proposed algorithm converges to the
desired solution at a rate $\widetilde{O}(1/t)$ when the operator samples are
generated from a single from Markov process trajectory. A second contribution of this work is to demonstrate that the objective of a
range of problems in reinforcement learning and multi-agent games can be
expressed as a system of fixed-point equations. As such, the proposed approach
can be used to design new learning algorithms for solving these problems. We
illustrate this observation with numerical simulations in a multi-agent game
and show the advantage of the proposed method over the standard
multi-time-scale stochastic approximation algorithm.