Robust adaptive maximum-entropy linear quadratic regulator

IF 1.8 Q3 AUTOMATION & CONTROL SYSTEMS
Ahmed Kamel, Ramin Esmzad, Nariman Niknejad, Hamidreza Modares
{"title":"Robust adaptive maximum-entropy linear quadratic regulator","authors":"Ahmed Kamel,&nbsp;Ramin Esmzad,&nbsp;Nariman Niknejad,&nbsp;Hamidreza Modares","doi":"10.1016/j.ifacsc.2025.100305","DOIUrl":null,"url":null,"abstract":"<div><div>Balancing the trade-off between venturing into unknowns (exploration for learning) and optimizing outcomes within familiar grounds (exploitation for performance delivery) is a longstanding challenge in learning-enabled control systems. This is specifically challenging when the learning process starts with no data and rich data must be collected from the closed-loop system. This is in sharp contrast to the standard practice in data-driven control that assumes the availability of a priori rich collected open-loop data. To ensure that the closed-loop system delivers acceptable performance despite exploration for rich data collection in the context of linear quadratic regulator (LQR), we first formalize a linear matrix inequality (LMI) solution for an LQR problem that is regularized by the control entropy. Given available side information (e.g., a set that system parameters belong to), a conservative solution to the LQR can be found. To reduce the conservatism over time while ensuring an acceptable performance during learning, we present a set membership closed-loop system identification and integrate it with side information in solving the entropy-regularized LQR through Schur complement, along with the lossy S-procedure. We show that the presented set membership approach progressively improves the entropy-regularized LQR cost by shrinking the size of the set of system parameters. We also show that this is achieved while guaranteeing acceptable performance. An iterative algorithm is presented using the closed-loop set membership learning to progressively learn a new improved controller after every online data sample is collected by applying the current learned control policy. Simulation examples are provided to verify the effectiveness of the presented results.</div></div>","PeriodicalId":29926,"journal":{"name":"IFAC Journal of Systems and Control","volume":"32 ","pages":"Article 100305"},"PeriodicalIF":1.8000,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IFAC Journal of Systems and Control","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468601825000112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Balancing the trade-off between venturing into unknowns (exploration for learning) and optimizing outcomes within familiar grounds (exploitation for performance delivery) is a longstanding challenge in learning-enabled control systems. This is specifically challenging when the learning process starts with no data and rich data must be collected from the closed-loop system. This is in sharp contrast to the standard practice in data-driven control that assumes the availability of a priori rich collected open-loop data. To ensure that the closed-loop system delivers acceptable performance despite exploration for rich data collection in the context of linear quadratic regulator (LQR), we first formalize a linear matrix inequality (LMI) solution for an LQR problem that is regularized by the control entropy. Given available side information (e.g., a set that system parameters belong to), a conservative solution to the LQR can be found. To reduce the conservatism over time while ensuring an acceptable performance during learning, we present a set membership closed-loop system identification and integrate it with side information in solving the entropy-regularized LQR through Schur complement, along with the lossy S-procedure. We show that the presented set membership approach progressively improves the entropy-regularized LQR cost by shrinking the size of the set of system parameters. We also show that this is achieved while guaranteeing acceptable performance. An iterative algorithm is presented using the closed-loop set membership learning to progressively learn a new improved controller after every online data sample is collected by applying the current learned control policy. Simulation examples are provided to verify the effectiveness of the presented results.
求助全文
约1分钟内获得全文 求助全文
来源期刊
IFAC Journal of Systems and Control
IFAC Journal of Systems and Control AUTOMATION & CONTROL SYSTEMS-
CiteScore
3.70
自引率
5.30%
发文量
17
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信