Robust adaptive maximum-entropy linear quadratic regulator

IF 1.8 Q3 AUTOMATION & CONTROL SYSTEMS

IFAC Journal of Systems and Control Pub Date : 2025-03-29 DOI:10.1016/j.ifacsc.2025.100305

Ahmed Kamel, Ramin Esmzad, Nariman Niknejad, Hamidreza Modares

{"title":"Robust adaptive maximum-entropy linear quadratic regulator","authors":"Ahmed Kamel, Ramin Esmzad, Nariman Niknejad, Hamidreza Modares","doi":"10.1016/j.ifacsc.2025.100305","DOIUrl":null,"url":null,"abstract":"<div><div>Balancing the trade-off between venturing into unknowns (exploration for learning) and optimizing outcomes within familiar grounds (exploitation for performance delivery) is a longstanding challenge in learning-enabled control systems. This is specifically challenging when the learning process starts with no data and rich data must be collected from the closed-loop system. This is in sharp contrast to the standard practice in data-driven control that assumes the availability of a priori rich collected open-loop data. To ensure that the closed-loop system delivers acceptable performance despite exploration for rich data collection in the context of linear quadratic regulator (LQR), we first formalize a linear matrix inequality (LMI) solution for an LQR problem that is regularized by the control entropy. Given available side information (e.g., a set that system parameters belong to), a conservative solution to the LQR can be found. To reduce the conservatism over time while ensuring an acceptable performance during learning, we present a set membership closed-loop system identification and integrate it with side information in solving the entropy-regularized LQR through Schur complement, along with the lossy S-procedure. We show that the presented set membership approach progressively improves the entropy-regularized LQR cost by shrinking the size of the set of system parameters. We also show that this is achieved while guaranteeing acceptable performance. An iterative algorithm is presented using the closed-loop set membership learning to progressively learn a new improved controller after every online data sample is collected by applying the current learned control policy. Simulation examples are provided to verify the effectiveness of the presented results.</div></div>","PeriodicalId":29926,"journal":{"name":"IFAC Journal of Systems and Control","volume":"32 ","pages":"Article 100305"},"PeriodicalIF":1.8000,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IFAC Journal of Systems and Control","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468601825000112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Balancing the trade-off between venturing into unknowns (exploration for learning) and optimizing outcomes within familiar grounds (exploitation for performance delivery) is a longstanding challenge in learning-enabled control systems. This is specifically challenging when the learning process starts with no data and rich data must be collected from the closed-loop system. This is in sharp contrast to the standard practice in data-driven control that assumes the availability of a priori rich collected open-loop data. To ensure that the closed-loop system delivers acceptable performance despite exploration for rich data collection in the context of linear quadratic regulator (LQR), we first formalize a linear matrix inequality (LMI) solution for an LQR problem that is regularized by the control entropy. Given available side information (e.g., a set that system parameters belong to), a conservative solution to the LQR can be found. To reduce the conservatism over time while ensuring an acceptable performance during learning, we present a set membership closed-loop system identification and integrate it with side information in solving the entropy-regularized LQR through Schur complement, along with the lossy S-procedure. We show that the presented set membership approach progressively improves the entropy-regularized LQR cost by shrinking the size of the set of system parameters. We also show that this is achieved while guaranteeing acceptable performance. An iterative algorithm is presented using the closed-loop set membership learning to progressively learn a new improved controller after every online data sample is collected by applying the current learned control policy. Simulation examples are provided to verify the effectiveness of the presented results.

查看原文本刊更多论文

鲁棒自适应最大熵线性二次型调节器

在探索未知（探索学习）和在熟悉的基础上优化结果（利用性能交付）之间取得平衡，是学习型控制系统长期面临的挑战。当学习过程开始时没有数据，而必须从闭环系统收集丰富的数据时，这尤其具有挑战性。这与数据驱动控制中的标准实践形成鲜明对比，后者假设有先验的丰富的开环数据收集。为了确保闭环系统在线性二次型调节器（LQR）背景下提供可接受的性能，尽管探索丰富的数据收集，我们首先形式化了由控制熵正则化的LQR问题的线性矩阵不等式（LMI）解决方案。给定可用的侧信息（例如，系统参数所属的集合），可以找到LQR的保守解。为了减少随时间推移的保守性，同时确保在学习过程中具有可接受的性能，我们提出了一种集成员闭环系统识别方法，并将其与侧信息相结合，通过Schur补和有损s过程求解熵正则化LQR。我们证明了所提出的集合隶属度方法通过缩小系统参数集的大小逐步提高了熵正则化LQR代价。我们还表明，这是在保证可接受的性能的同时实现的。提出了一种利用闭环集隶属度学习的迭代算法，通过应用当前学习到的控制策略，在每个在线数据样本采集后逐步学习新的改进控制器。仿真算例验证了所提结果的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IFAC Journal of Systems and Control AUTOMATION & CONTROL SYSTEMS-

CiteScore

3.70

自引率

5.30%

发文量