Bin Hu, Kaiqing Zhang, Na Li, Mehran Mesbahi, Maryam Fazel, Tamer Başar
{"title":"基于学习控制策略的策略优化理论基础","authors":"Bin Hu, Kaiqing Zhang, Na Li, Mehran Mesbahi, Maryam Fazel, Tamer Başar","doi":"10.1146/annurev-control-042920-020021","DOIUrl":null,"url":null,"abstract":"Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis that has been popularized by successes of reinforcement learning. We take an interdisciplinary perspective in our exposition that connects control theory, reinforcement learning, and large-scale optimization. We review a number of recently developed theoretical results on the optimization landscape, global convergence, and sample complexityof gradient-based methods for various continuous control problems, such as the linear quadratic regulator (LQR), [Formula: see text] control, risk-sensitive control, linear quadratic Gaussian (LQG) control, and output feedback synthesis. In conjunction with these optimization results, we also discuss how direct policy optimization handles stability and robustness concerns in learning-based control, two main desiderata in control engineering. We conclude the survey by pointing out several challenges and opportunities at the intersection of learning and control.","PeriodicalId":29961,"journal":{"name":"Annual Review of Control Robotics and Autonomous Systems","volume":null,"pages":null},"PeriodicalIF":11.2000,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Toward a Theoretical Foundation of Policy Optimization for Learning Control Policies\",\"authors\":\"Bin Hu, Kaiqing Zhang, Na Li, Mehran Mesbahi, Maryam Fazel, Tamer Başar\",\"doi\":\"10.1146/annurev-control-042920-020021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis that has been popularized by successes of reinforcement learning. We take an interdisciplinary perspective in our exposition that connects control theory, reinforcement learning, and large-scale optimization. We review a number of recently developed theoretical results on the optimization landscape, global convergence, and sample complexityof gradient-based methods for various continuous control problems, such as the linear quadratic regulator (LQR), [Formula: see text] control, risk-sensitive control, linear quadratic Gaussian (LQG) control, and output feedback synthesis. In conjunction with these optimization results, we also discuss how direct policy optimization handles stability and robustness concerns in learning-based control, two main desiderata in control engineering. We conclude the survey by pointing out several challenges and opportunities at the intersection of learning and control.\",\"PeriodicalId\":29961,\"journal\":{\"name\":\"Annual Review of Control Robotics and Autonomous Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":11.2000,\"publicationDate\":\"2023-05-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annual Review of Control Robotics and Autonomous Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1146/annurev-control-042920-020021\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Review of Control Robotics and Autonomous Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1146/annurev-control-042920-020021","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Toward a Theoretical Foundation of Policy Optimization for Learning Control Policies
Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis that has been popularized by successes of reinforcement learning. We take an interdisciplinary perspective in our exposition that connects control theory, reinforcement learning, and large-scale optimization. We review a number of recently developed theoretical results on the optimization landscape, global convergence, and sample complexityof gradient-based methods for various continuous control problems, such as the linear quadratic regulator (LQR), [Formula: see text] control, risk-sensitive control, linear quadratic Gaussian (LQG) control, and output feedback synthesis. In conjunction with these optimization results, we also discuss how direct policy optimization handles stability and robustness concerns in learning-based control, two main desiderata in control engineering. We conclude the survey by pointing out several challenges and opportunities at the intersection of learning and control.
期刊介绍:
The Annual Review of Control, Robotics, and Autonomous Systems offers comprehensive reviews on theoretical and applied developments influencing autonomous and semiautonomous systems engineering. Major areas covered include control, robotics, mechanics, optimization, communication, information theory, machine learning, computing, and signal processing. The journal extends its reach beyond engineering to intersect with fields like biology, neuroscience, and human behavioral sciences. The current volume has transitioned to open access through the Subscribe to Open program, with all articles published under a CC BY license.