{"title":"A Reinforcement Learning Look at Risk-Sensitive Linear Quadratic Gaussian Control","authors":"Leilei Cui, Zhong-Ping Jiang","doi":"10.48550/arXiv.2212.02072","DOIUrl":null,"url":null,"abstract":"This paper proposes a novel robust reinforcement learning framework for discrete-time systems with model mismatch that may arise from the sim2real gap. A key strategy is to invoke advanced techniques from control theory. Using the formulation of the classical risk-sensitive linear quadratic Gaussian control, a dual-loop policy iteration algorithm is proposed to generate a robust optimal controller. The dual-loop policy iteration algorithm is shown to be globally exponentially and uniformly convergent, and robust against disturbance during the learning process. This robustness property is called small-disturbance input-to-state stability and guarantees that the proposed policy iteration algorithm converges to a small neighborhood of the optimal controller as long as the disturbance at each learning step is small. In addition, when the system dynamics is unknown, a novel model-free off-policy policy iteration algorithm is proposed for the same class of dynamical system with additive Gaussian noise. Finally, numerical examples are provided for the demonstration of the proposed algorithm.","PeriodicalId":268449,"journal":{"name":"Conference on Learning for Dynamics & Control","volume":"412 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Learning for Dynamics & Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2212.02072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
This paper proposes a novel robust reinforcement learning framework for discrete-time systems with model mismatch that may arise from the sim2real gap. A key strategy is to invoke advanced techniques from control theory. Using the formulation of the classical risk-sensitive linear quadratic Gaussian control, a dual-loop policy iteration algorithm is proposed to generate a robust optimal controller. The dual-loop policy iteration algorithm is shown to be globally exponentially and uniformly convergent, and robust against disturbance during the learning process. This robustness property is called small-disturbance input-to-state stability and guarantees that the proposed policy iteration algorithm converges to a small neighborhood of the optimal controller as long as the disturbance at each learning step is small. In addition, when the system dynamics is unknown, a novel model-free off-policy policy iteration algorithm is proposed for the same class of dynamical system with additive Gaussian noise. Finally, numerical examples are provided for the demonstration of the proposed algorithm.