鲁棒自适应最大熵线性二次型调节器

IF 1.8 Q3 AUTOMATION & CONTROL SYSTEMS
Ahmed Kamel, Ramin Esmzad, Nariman Niknejad, Hamidreza Modares
{"title":"鲁棒自适应最大熵线性二次型调节器","authors":"Ahmed Kamel,&nbsp;Ramin Esmzad,&nbsp;Nariman Niknejad,&nbsp;Hamidreza Modares","doi":"10.1016/j.ifacsc.2025.100305","DOIUrl":null,"url":null,"abstract":"<div><div>Balancing the trade-off between venturing into unknowns (exploration for learning) and optimizing outcomes within familiar grounds (exploitation for performance delivery) is a longstanding challenge in learning-enabled control systems. This is specifically challenging when the learning process starts with no data and rich data must be collected from the closed-loop system. This is in sharp contrast to the standard practice in data-driven control that assumes the availability of a priori rich collected open-loop data. To ensure that the closed-loop system delivers acceptable performance despite exploration for rich data collection in the context of linear quadratic regulator (LQR), we first formalize a linear matrix inequality (LMI) solution for an LQR problem that is regularized by the control entropy. Given available side information (e.g., a set that system parameters belong to), a conservative solution to the LQR can be found. To reduce the conservatism over time while ensuring an acceptable performance during learning, we present a set membership closed-loop system identification and integrate it with side information in solving the entropy-regularized LQR through Schur complement, along with the lossy S-procedure. We show that the presented set membership approach progressively improves the entropy-regularized LQR cost by shrinking the size of the set of system parameters. We also show that this is achieved while guaranteeing acceptable performance. An iterative algorithm is presented using the closed-loop set membership learning to progressively learn a new improved controller after every online data sample is collected by applying the current learned control policy. Simulation examples are provided to verify the effectiveness of the presented results.</div></div>","PeriodicalId":29926,"journal":{"name":"IFAC Journal of Systems and Control","volume":"32 ","pages":"Article 100305"},"PeriodicalIF":1.8000,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Robust adaptive maximum-entropy linear quadratic regulator\",\"authors\":\"Ahmed Kamel,&nbsp;Ramin Esmzad,&nbsp;Nariman Niknejad,&nbsp;Hamidreza Modares\",\"doi\":\"10.1016/j.ifacsc.2025.100305\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Balancing the trade-off between venturing into unknowns (exploration for learning) and optimizing outcomes within familiar grounds (exploitation for performance delivery) is a longstanding challenge in learning-enabled control systems. This is specifically challenging when the learning process starts with no data and rich data must be collected from the closed-loop system. This is in sharp contrast to the standard practice in data-driven control that assumes the availability of a priori rich collected open-loop data. To ensure that the closed-loop system delivers acceptable performance despite exploration for rich data collection in the context of linear quadratic regulator (LQR), we first formalize a linear matrix inequality (LMI) solution for an LQR problem that is regularized by the control entropy. Given available side information (e.g., a set that system parameters belong to), a conservative solution to the LQR can be found. To reduce the conservatism over time while ensuring an acceptable performance during learning, we present a set membership closed-loop system identification and integrate it with side information in solving the entropy-regularized LQR through Schur complement, along with the lossy S-procedure. We show that the presented set membership approach progressively improves the entropy-regularized LQR cost by shrinking the size of the set of system parameters. We also show that this is achieved while guaranteeing acceptable performance. An iterative algorithm is presented using the closed-loop set membership learning to progressively learn a new improved controller after every online data sample is collected by applying the current learned control policy. Simulation examples are provided to verify the effectiveness of the presented results.</div></div>\",\"PeriodicalId\":29926,\"journal\":{\"name\":\"IFAC Journal of Systems and Control\",\"volume\":\"32 \",\"pages\":\"Article 100305\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2025-03-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IFAC Journal of Systems and Control\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2468601825000112\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IFAC Journal of Systems and Control","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468601825000112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

在探索未知(探索学习)和在熟悉的基础上优化结果(利用性能交付)之间取得平衡,是学习型控制系统长期面临的挑战。当学习过程开始时没有数据,而必须从闭环系统收集丰富的数据时,这尤其具有挑战性。这与数据驱动控制中的标准实践形成鲜明对比,后者假设有先验的丰富的开环数据收集。为了确保闭环系统在线性二次型调节器(LQR)背景下提供可接受的性能,尽管探索丰富的数据收集,我们首先形式化了由控制熵正则化的LQR问题的线性矩阵不等式(LMI)解决方案。给定可用的侧信息(例如,系统参数所属的集合),可以找到LQR的保守解。为了减少随时间推移的保守性,同时确保在学习过程中具有可接受的性能,我们提出了一种集成员闭环系统识别方法,并将其与侧信息相结合,通过Schur补和有损s过程求解熵正则化LQR。我们证明了所提出的集合隶属度方法通过缩小系统参数集的大小逐步提高了熵正则化LQR代价。我们还表明,这是在保证可接受的性能的同时实现的。提出了一种利用闭环集隶属度学习的迭代算法,通过应用当前学习到的控制策略,在每个在线数据样本采集后逐步学习新的改进控制器。仿真算例验证了所提结果的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Robust adaptive maximum-entropy linear quadratic regulator
Balancing the trade-off between venturing into unknowns (exploration for learning) and optimizing outcomes within familiar grounds (exploitation for performance delivery) is a longstanding challenge in learning-enabled control systems. This is specifically challenging when the learning process starts with no data and rich data must be collected from the closed-loop system. This is in sharp contrast to the standard practice in data-driven control that assumes the availability of a priori rich collected open-loop data. To ensure that the closed-loop system delivers acceptable performance despite exploration for rich data collection in the context of linear quadratic regulator (LQR), we first formalize a linear matrix inequality (LMI) solution for an LQR problem that is regularized by the control entropy. Given available side information (e.g., a set that system parameters belong to), a conservative solution to the LQR can be found. To reduce the conservatism over time while ensuring an acceptable performance during learning, we present a set membership closed-loop system identification and integrate it with side information in solving the entropy-regularized LQR through Schur complement, along with the lossy S-procedure. We show that the presented set membership approach progressively improves the entropy-regularized LQR cost by shrinking the size of the set of system parameters. We also show that this is achieved while guaranteeing acceptable performance. An iterative algorithm is presented using the closed-loop set membership learning to progressively learn a new improved controller after every online data sample is collected by applying the current learned control policy. Simulation examples are provided to verify the effectiveness of the presented results.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IFAC Journal of Systems and Control
IFAC Journal of Systems and Control AUTOMATION & CONTROL SYSTEMS-
CiteScore
3.70
自引率
5.30%
发文量
17
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信