{"title":"受控扩散过程的近似Q学习及其近最优性","authors":"Erhan Bayraktar, A. D. Kara","doi":"10.1137/22m1484201","DOIUrl":null,"url":null,"abstract":"We study a Q learning algorithm for continuous time stochastic control problems. The proposed algorithm uses the sampled state process by discretizing the state and control action spaces under piece-wise constant control processes. We show that the algorithm converges to the optimality equation of a finite Markov decision process (MDP). Using this MDP model, we provide an upper bound for the approximation error for the optimal value function of the continuous time control problem. Furthermore, we present provable upper-bounds for the performance loss of the learned control process compared to the optimal admissible control process of the original problem. The provided error upper-bounds are functions of the time and space discretization parameters, and they reveal the effect of different levels of the approximation: (i) approximation of the continuous time control problem by an MDP, (ii) use of piece-wise constant control processes, (iii) space discretization. Finally, we state a time complexity bound for the proposed algorithm as a function of the time and space discretization parameters.","PeriodicalId":74797,"journal":{"name":"SIAM journal on mathematics of data science","volume":"91 1","pages":"615-638"},"PeriodicalIF":1.9000,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Approximate Q Learning for Controlled Diffusion Processes and Its Near Optimality\",\"authors\":\"Erhan Bayraktar, A. D. Kara\",\"doi\":\"10.1137/22m1484201\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study a Q learning algorithm for continuous time stochastic control problems. The proposed algorithm uses the sampled state process by discretizing the state and control action spaces under piece-wise constant control processes. We show that the algorithm converges to the optimality equation of a finite Markov decision process (MDP). Using this MDP model, we provide an upper bound for the approximation error for the optimal value function of the continuous time control problem. Furthermore, we present provable upper-bounds for the performance loss of the learned control process compared to the optimal admissible control process of the original problem. The provided error upper-bounds are functions of the time and space discretization parameters, and they reveal the effect of different levels of the approximation: (i) approximation of the continuous time control problem by an MDP, (ii) use of piece-wise constant control processes, (iii) space discretization. Finally, we state a time complexity bound for the proposed algorithm as a function of the time and space discretization parameters.\",\"PeriodicalId\":74797,\"journal\":{\"name\":\"SIAM journal on mathematics of data science\",\"volume\":\"91 1\",\"pages\":\"615-638\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2022-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SIAM journal on mathematics of data science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1137/22m1484201\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIAM journal on mathematics of data science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1137/22m1484201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
Approximate Q Learning for Controlled Diffusion Processes and Its Near Optimality
We study a Q learning algorithm for continuous time stochastic control problems. The proposed algorithm uses the sampled state process by discretizing the state and control action spaces under piece-wise constant control processes. We show that the algorithm converges to the optimality equation of a finite Markov decision process (MDP). Using this MDP model, we provide an upper bound for the approximation error for the optimal value function of the continuous time control problem. Furthermore, we present provable upper-bounds for the performance loss of the learned control process compared to the optimal admissible control process of the original problem. The provided error upper-bounds are functions of the time and space discretization parameters, and they reveal the effect of different levels of the approximation: (i) approximation of the continuous time control problem by an MDP, (ii) use of piece-wise constant control processes, (iii) space discretization. Finally, we state a time complexity bound for the proposed algorithm as a function of the time and space discretization parameters.