Ahmed Kamel, Ramin Esmzad, Nariman Niknejad, Hamidreza Modares
{"title":"Robust adaptive maximum-entropy linear quadratic regulator","authors":"Ahmed Kamel, Ramin Esmzad, Nariman Niknejad, Hamidreza Modares","doi":"10.1016/j.ifacsc.2025.100305","DOIUrl":null,"url":null,"abstract":"<div><div>Balancing the trade-off between venturing into unknowns (exploration for learning) and optimizing outcomes within familiar grounds (exploitation for performance delivery) is a longstanding challenge in learning-enabled control systems. This is specifically challenging when the learning process starts with no data and rich data must be collected from the closed-loop system. This is in sharp contrast to the standard practice in data-driven control that assumes the availability of a priori rich collected open-loop data. To ensure that the closed-loop system delivers acceptable performance despite exploration for rich data collection in the context of linear quadratic regulator (LQR), we first formalize a linear matrix inequality (LMI) solution for an LQR problem that is regularized by the control entropy. Given available side information (e.g., a set that system parameters belong to), a conservative solution to the LQR can be found. To reduce the conservatism over time while ensuring an acceptable performance during learning, we present a set membership closed-loop system identification and integrate it with side information in solving the entropy-regularized LQR through Schur complement, along with the lossy S-procedure. We show that the presented set membership approach progressively improves the entropy-regularized LQR cost by shrinking the size of the set of system parameters. We also show that this is achieved while guaranteeing acceptable performance. An iterative algorithm is presented using the closed-loop set membership learning to progressively learn a new improved controller after every online data sample is collected by applying the current learned control policy. Simulation examples are provided to verify the effectiveness of the presented results.</div></div>","PeriodicalId":29926,"journal":{"name":"IFAC Journal of Systems and Control","volume":"32 ","pages":"Article 100305"},"PeriodicalIF":1.8000,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IFAC Journal of Systems and Control","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468601825000112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Balancing the trade-off between venturing into unknowns (exploration for learning) and optimizing outcomes within familiar grounds (exploitation for performance delivery) is a longstanding challenge in learning-enabled control systems. This is specifically challenging when the learning process starts with no data and rich data must be collected from the closed-loop system. This is in sharp contrast to the standard practice in data-driven control that assumes the availability of a priori rich collected open-loop data. To ensure that the closed-loop system delivers acceptable performance despite exploration for rich data collection in the context of linear quadratic regulator (LQR), we first formalize a linear matrix inequality (LMI) solution for an LQR problem that is regularized by the control entropy. Given available side information (e.g., a set that system parameters belong to), a conservative solution to the LQR can be found. To reduce the conservatism over time while ensuring an acceptable performance during learning, we present a set membership closed-loop system identification and integrate it with side information in solving the entropy-regularized LQR through Schur complement, along with the lossy S-procedure. We show that the presented set membership approach progressively improves the entropy-regularized LQR cost by shrinking the size of the set of system parameters. We also show that this is achieved while guaranteeing acceptable performance. An iterative algorithm is presented using the closed-loop set membership learning to progressively learn a new improved controller after every online data sample is collected by applying the current learned control policy. Simulation examples are provided to verify the effectiveness of the presented results.