具有鲁棒保证的直接数据驱动折现无限视界线性二次型调节器

IF 5.9 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Automatica Pub Date : 2025-02-12 DOI:10.1016/j.automatica.2025.112197

Ramin Esmzad, Hamidreza Modares

{"title":"具有鲁棒保证的直接数据驱动折现无限视界线性二次型调节器","authors":"Ramin Esmzad, Hamidreza Modares","doi":"10.1016/j.automatica.2025.112197","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents a one-shot learning approach with performance and robustness guarantees for the linear quadratic regulator (LQR) control of stochastic linear systems. Even though data-based LQR control has been widely considered, existing results suffer either from data hungriness due to the inherently iterative nature of the optimization formulation (e.g., value learning or policy gradient reinforcement learning algorithms) or from a lack of robustness guarantees in one-shot non-iterative algorithms. To avoid data hungriness while ensuing robustness guarantees, an adaptive dynamic programming formalization of the LQR is presented that relies on solving a Bellman inequality. The control gain and the value function are directly learned by using a control-oriented approach that characterizes the closed-loop system using data and a decision variable from which the control is obtained. This closed-loop characterization is noise-dependent. The effect of the closed-loop system noise on the Bellman inequality is considered to ensure both robust stability and suboptimal performance despite ignoring the measurement noise. To ensure robust stability, it is shown that this system characterization leads to a closed-loop system with multiplicative and additive noise, enabling the application of distributional robust control techniques. The analysis of the suboptimality gap reveals that robustness can be achieved by construction without the need for regularization or parameter tuning. The simulation results on the active car suspension problem demonstrate the superiority of the proposed method in terms of robustness and performance gap compared to existing methods.</div></div>","PeriodicalId":55413,"journal":{"name":"Automatica","volume":"175 ","pages":"Article 112197"},"PeriodicalIF":5.9000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Direct data-driven discounted infinite horizon linear quadratic regulator with robustness guarantees\",\"authors\":\"Ramin Esmzad, Hamidreza Modares\",\"doi\":\"10.1016/j.automatica.2025.112197\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper presents a one-shot learning approach with performance and robustness guarantees for the linear quadratic regulator (LQR) control of stochastic linear systems. Even though data-based LQR control has been widely considered, existing results suffer either from data hungriness due to the inherently iterative nature of the optimization formulation (e.g., value learning or policy gradient reinforcement learning algorithms) or from a lack of robustness guarantees in one-shot non-iterative algorithms. To avoid data hungriness while ensuing robustness guarantees, an adaptive dynamic programming formalization of the LQR is presented that relies on solving a Bellman inequality. The control gain and the value function are directly learned by using a control-oriented approach that characterizes the closed-loop system using data and a decision variable from which the control is obtained. This closed-loop characterization is noise-dependent. The effect of the closed-loop system noise on the Bellman inequality is considered to ensure both robust stability and suboptimal performance despite ignoring the measurement noise. To ensure robust stability, it is shown that this system characterization leads to a closed-loop system with multiplicative and additive noise, enabling the application of distributional robust control techniques. The analysis of the suboptimality gap reveals that robustness can be achieved by construction without the need for regularization or parameter tuning. The simulation results on the active car suspension problem demonstrate the superiority of the proposed method in terms of robustness and performance gap compared to existing methods.</div></div>\",\"PeriodicalId\":55413,\"journal\":{\"name\":\"Automatica\",\"volume\":\"175 \",\"pages\":\"Article 112197\"},\"PeriodicalIF\":5.9000,\"publicationDate\":\"2025-02-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Automatica\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0005109825000883\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automatica","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0005109825000883","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

针对随机线性系统的线性二次调节器（LQR）控制，提出了一种具有性能和鲁棒性保证的单次学习方法。尽管基于数据的LQR控制已被广泛考虑，但现有的结果要么由于优化公式固有的迭代性质（例如，值学习或策略梯度强化学习算法）而遭受数据饥饿，要么由于一次性非迭代算法缺乏鲁棒性保证。为了在保证鲁棒性的同时避免数据饥渴，提出了一种依赖于求解Bellman不等式的LQR自适应动态规划形式化方法。控制增益和值函数是通过使用数据和决策变量来表征闭环系统的面向控制的方法直接学习的。这种闭环特性与噪声有关。在忽略测量噪声的情况下，考虑了闭环系统噪声对Bellman不等式的影响，以保证鲁棒稳定性和次优性能。为了确保鲁棒稳定性，该系统特性导致了一个具有乘性和加性噪声的闭环系统，从而实现了分布鲁棒控制技术的应用。对次优性间隙的分析表明，在不需要正则化或参数调整的情况下，可以通过构造来实现鲁棒性。针对汽车主动悬架问题的仿真结果表明，该方法在鲁棒性和性能差距方面均优于现有方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Direct data-driven discounted infinite horizon linear quadratic regulator with robustness guarantees

This paper presents a one-shot learning approach with performance and robustness guarantees for the linear quadratic regulator (LQR) control of stochastic linear systems. Even though data-based LQR control has been widely considered, existing results suffer either from data hungriness due to the inherently iterative nature of the optimization formulation (e.g., value learning or policy gradient reinforcement learning algorithms) or from a lack of robustness guarantees in one-shot non-iterative algorithms. To avoid data hungriness while ensuing robustness guarantees, an adaptive dynamic programming formalization of the LQR is presented that relies on solving a Bellman inequality. The control gain and the value function are directly learned by using a control-oriented approach that characterizes the closed-loop system using data and a decision variable from which the control is obtained. This closed-loop characterization is noise-dependent. The effect of the closed-loop system noise on the Bellman inequality is considered to ensure both robust stability and suboptimal performance despite ignoring the measurement noise. To ensure robust stability, it is shown that this system characterization leads to a closed-loop system with multiplicative and additive noise, enabling the application of distributional robust control techniques. The analysis of the suboptimality gap reveals that robustness can be achieved by construction without the need for regularization or parameter tuning. The simulation results on the active car suspension problem demonstrate the superiority of the proposed method in terms of robustness and performance gap compared to existing methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Automatica 工程技术-工程：电子与电气

CiteScore

10.70

自引率

7.80%

发文量

617

审稿时长

5 months

期刊介绍： Automatica is a leading archival publication in the field of systems and control. The field encompasses today a broad set of areas and topics, and is thriving not only within itself but also in terms of its impact on other fields, such as communications, computers, biology, energy and economics. Since its inception in 1963, Automatica has kept abreast with the evolution of the field over the years, and has emerged as a leading publication driving the trends in the field. After being founded in 1963, Automatica became a journal of the International Federation of Automatic Control (IFAC) in 1969. It features a characteristic blend of theoretical and applied papers of archival, lasting value, reporting cutting edge research results by authors across the globe. It features articles in distinct categories, including regular, brief and survey papers, technical communiqués, correspondence items, as well as reviews on published books of interest to the readership. It occasionally publishes special issues on emerging new topics or established mature topics of interest to a broad audience. Automatica solicits original high-quality contributions in all the categories listed above, and in all areas of systems and control interpreted in a broad sense and evolving constantly. They may be submitted directly to a subject editor or to the Editor-in-Chief if not sure about the subject area. Editorial procedures in place assure careful, fair, and prompt handling of all submitted articles. Accepted papers appear in the journal in the shortest time feasible given production time constraints.