基于数据驱动优化的成本与最优控制推理

IF 2 Q2 AUTOMATION & CONTROL SYSTEMS
Jiacheng Wu;Wenqian Xue;Frank L. Lewis;Bosen Lian
{"title":"基于数据驱动优化的成本与最优控制推理","authors":"Jiacheng Wu;Wenqian Xue;Frank L. Lewis;Bosen Lian","doi":"10.1109/LCSYS.2025.3584907","DOIUrl":null,"url":null,"abstract":"This letter develops a novel optimization-based inverse reinforcement learning (RL) control algorithm that infers the minimal cost from observed demonstrations via optimization-based policy evaluation and update. The core idea is the simultaneous evaluation of the value function matrix and cost weight during policy evaluation under a given control policy, which simplifies the algorithmic structure and reduces the iterations required for convergence. Based on this idea, we first develop a model-based algorithm with detailed implementation steps, and analyze the monotonicity and convergence properties of the cost weight. Then, based on Willems’ lemma, we develop a data-driven algorithm to learn an equivalent weight matrix from persistently excited (PE) data. We also prove the convergence of the data-driven algorithm and show that the converged results learned from PE data are unbiased. Finally, simulations on a power system are carried out to demonstrate the effectiveness of the proposed inverse RL algorithm.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"9 ","pages":"1700-1705"},"PeriodicalIF":2.0000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data-Driven Optimization-Based Cost and Optimal Control Inference\",\"authors\":\"Jiacheng Wu;Wenqian Xue;Frank L. Lewis;Bosen Lian\",\"doi\":\"10.1109/LCSYS.2025.3584907\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This letter develops a novel optimization-based inverse reinforcement learning (RL) control algorithm that infers the minimal cost from observed demonstrations via optimization-based policy evaluation and update. The core idea is the simultaneous evaluation of the value function matrix and cost weight during policy evaluation under a given control policy, which simplifies the algorithmic structure and reduces the iterations required for convergence. Based on this idea, we first develop a model-based algorithm with detailed implementation steps, and analyze the monotonicity and convergence properties of the cost weight. Then, based on Willems’ lemma, we develop a data-driven algorithm to learn an equivalent weight matrix from persistently excited (PE) data. We also prove the convergence of the data-driven algorithm and show that the converged results learned from PE data are unbiased. Finally, simulations on a power system are carried out to demonstrate the effectiveness of the proposed inverse RL algorithm.\",\"PeriodicalId\":37235,\"journal\":{\"name\":\"IEEE Control Systems Letters\",\"volume\":\"9 \",\"pages\":\"1700-1705\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Control Systems Letters\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11062121/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Control Systems Letters","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11062121/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

本文开发了一种新的基于优化的逆强化学习(RL)控制算法,该算法通过基于优化的策略评估和更新,从观察到的演示中推断出最小成本。其核心思想是在给定控制策略下,在策略评估过程中同时评估价值函数矩阵和成本权值,从而简化了算法结构,减少了收敛所需的迭代次数。基于这一思想,我们首先开发了一种基于模型的算法,并给出了详细的实现步骤,分析了代价权值的单调性和收敛性。然后,基于Willems引理,提出了一种数据驱动算法,从持续激发(PE)数据中学习等效权矩阵。我们还证明了数据驱动算法的收敛性,并证明了从PE数据中学习到的收敛结果是无偏的。最后,在电力系统上进行了仿真,验证了所提逆RL算法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Data-Driven Optimization-Based Cost and Optimal Control Inference
This letter develops a novel optimization-based inverse reinforcement learning (RL) control algorithm that infers the minimal cost from observed demonstrations via optimization-based policy evaluation and update. The core idea is the simultaneous evaluation of the value function matrix and cost weight during policy evaluation under a given control policy, which simplifies the algorithmic structure and reduces the iterations required for convergence. Based on this idea, we first develop a model-based algorithm with detailed implementation steps, and analyze the monotonicity and convergence properties of the cost weight. Then, based on Willems’ lemma, we develop a data-driven algorithm to learn an equivalent weight matrix from persistently excited (PE) data. We also prove the convergence of the data-driven algorithm and show that the converged results learned from PE data are unbiased. Finally, simulations on a power system are carried out to demonstrate the effectiveness of the proposed inverse RL algorithm.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Control Systems Letters
IEEE Control Systems Letters Mathematics-Control and Optimization
CiteScore
4.40
自引率
13.30%
发文量
471
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信