{"title":"Data-Driven Optimization-Based Cost and Optimal Control Inference","authors":"Jiacheng Wu;Wenqian Xue;Frank L. Lewis;Bosen Lian","doi":"10.1109/LCSYS.2025.3584907","DOIUrl":null,"url":null,"abstract":"This letter develops a novel optimization-based inverse reinforcement learning (RL) control algorithm that infers the minimal cost from observed demonstrations via optimization-based policy evaluation and update. The core idea is the simultaneous evaluation of the value function matrix and cost weight during policy evaluation under a given control policy, which simplifies the algorithmic structure and reduces the iterations required for convergence. Based on this idea, we first develop a model-based algorithm with detailed implementation steps, and analyze the monotonicity and convergence properties of the cost weight. Then, based on Willems’ lemma, we develop a data-driven algorithm to learn an equivalent weight matrix from persistently excited (PE) data. We also prove the convergence of the data-driven algorithm and show that the converged results learned from PE data are unbiased. Finally, simulations on a power system are carried out to demonstrate the effectiveness of the proposed inverse RL algorithm.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"1700-1705"},"PeriodicalIF":2.0000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Control Systems Letters","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11062121/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
This letter develops a novel optimization-based inverse reinforcement learning (RL) control algorithm that infers the minimal cost from observed demonstrations via optimization-based policy evaluation and update. The core idea is the simultaneous evaluation of the value function matrix and cost weight during policy evaluation under a given control policy, which simplifies the algorithmic structure and reduces the iterations required for convergence. Based on this idea, we first develop a model-based algorithm with detailed implementation steps, and analyze the monotonicity and convergence properties of the cost weight. Then, based on Willems’ lemma, we develop a data-driven algorithm to learn an equivalent weight matrix from persistently excited (PE) data. We also prove the convergence of the data-driven algorithm and show that the converged results learned from PE data are unbiased. Finally, simulations on a power system are carried out to demonstrate the effectiveness of the proposed inverse RL algorithm.