{"title":"具有收敛保证的随机LQR的值迭代","authors":"Jing Lai;Junlin Xiong;Yu Kang","doi":"10.1109/TNNLS.2025.3558738","DOIUrl":null,"url":null,"abstract":"This brief studies the discounted stochastic linear quadratic regulator (LQR) problem for systems suffering from additive noise of unknown mean. A completely model-free (MF) value iteration (VI) algorithm is developed to learn the optimal control policy using off-line system trajectories. The generated control policies are proven to converge to a small neighborhood of the optimal ones with high probability. In addition, an MF algorithm is proposed to learn a feasible discount factor. The proposed MF algorithms are illustrated through several examples.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 6","pages":"11640-11649"},"PeriodicalIF":8.9000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Value Iteration for Stochastic LQR With Convergence Guarantees\",\"authors\":\"Jing Lai;Junlin Xiong;Yu Kang\",\"doi\":\"10.1109/TNNLS.2025.3558738\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This brief studies the discounted stochastic linear quadratic regulator (LQR) problem for systems suffering from additive noise of unknown mean. A completely model-free (MF) value iteration (VI) algorithm is developed to learn the optimal control policy using off-line system trajectories. The generated control policies are proven to converge to a small neighborhood of the optimal ones with high probability. In addition, an MF algorithm is proposed to learn a feasible discount factor. The proposed MF algorithms are illustrated through several examples.\",\"PeriodicalId\":13303,\"journal\":{\"name\":\"IEEE transactions on neural networks and learning systems\",\"volume\":\"36 6\",\"pages\":\"11640-11649\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on neural networks and learning systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10977962/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10977962/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Value Iteration for Stochastic LQR With Convergence Guarantees
This brief studies the discounted stochastic linear quadratic regulator (LQR) problem for systems suffering from additive noise of unknown mean. A completely model-free (MF) value iteration (VI) algorithm is developed to learn the optimal control policy using off-line system trajectories. The generated control policies are proven to converge to a small neighborhood of the optimal ones with high probability. In addition, an MF algorithm is proposed to learn a feasible discount factor. The proposed MF algorithms are illustrated through several examples.
期刊介绍:
The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.