{"title":"双时标准随机逼近的马尔可夫基础扩展版","authors":"Caio Kalil Lauand, Sean Meyn","doi":"arxiv-2409.07842","DOIUrl":null,"url":null,"abstract":"Many machine learning and optimization algorithms can be cast as instances of\nstochastic approximation (SA). The convergence rate of these algorithms is\nknown to be slow, with the optimal mean squared error (MSE) of order\n$O(n^{-1})$. In prior work it was shown that MSE bounds approaching $O(n^{-4})$\ncan be achieved through the framework of quasi-stochastic approximation (QSA);\nessentially SA with careful choice of deterministic exploration. These results\nare extended to two time-scale algorithms, as found in policy gradient methods\nof reinforcement learning and extremum seeking control. The extensions are made\npossible in part by a new approach to analysis, allowing for the interpretation\nof two timescale algorithms as instances of single timescale QSA, made possible\nby the theory of negative Lyapunov exponents for QSA. The general theory is\nillustrated with applications to extremum seeking control (ESC).","PeriodicalId":501286,"journal":{"name":"arXiv - MATH - Optimization and Control","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Markovian Foundations for Quasi-Stochastic Approximation in Two Timescales: Extended Version\",\"authors\":\"Caio Kalil Lauand, Sean Meyn\",\"doi\":\"arxiv-2409.07842\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many machine learning and optimization algorithms can be cast as instances of\\nstochastic approximation (SA). The convergence rate of these algorithms is\\nknown to be slow, with the optimal mean squared error (MSE) of order\\n$O(n^{-1})$. In prior work it was shown that MSE bounds approaching $O(n^{-4})$\\ncan be achieved through the framework of quasi-stochastic approximation (QSA);\\nessentially SA with careful choice of deterministic exploration. These results\\nare extended to two time-scale algorithms, as found in policy gradient methods\\nof reinforcement learning and extremum seeking control. The extensions are made\\npossible in part by a new approach to analysis, allowing for the interpretation\\nof two timescale algorithms as instances of single timescale QSA, made possible\\nby the theory of negative Lyapunov exponents for QSA. The general theory is\\nillustrated with applications to extremum seeking control (ESC).\",\"PeriodicalId\":501286,\"journal\":{\"name\":\"arXiv - MATH - Optimization and Control\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - MATH - Optimization and Control\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07842\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Optimization and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07842","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Markovian Foundations for Quasi-Stochastic Approximation in Two Timescales: Extended Version
Many machine learning and optimization algorithms can be cast as instances of
stochastic approximation (SA). The convergence rate of these algorithms is
known to be slow, with the optimal mean squared error (MSE) of order
$O(n^{-1})$. In prior work it was shown that MSE bounds approaching $O(n^{-4})$
can be achieved through the framework of quasi-stochastic approximation (QSA);
essentially SA with careful choice of deterministic exploration. These results
are extended to two time-scale algorithms, as found in policy gradient methods
of reinforcement learning and extremum seeking control. The extensions are made
possible in part by a new approach to analysis, allowing for the interpretation
of two timescale algorithms as instances of single timescale QSA, made possible
by the theory of negative Lyapunov exponents for QSA. The general theory is
illustrated with applications to extremum seeking control (ESC).