Integrating Reinforcement Learning and Model Predictive Control for Mixed- Logical Dynamical Systems

IEEE open journal of control systems Pub Date : 2025-08-21 DOI:10.1109/OJCSYS.2025.3601435

Caio Fabio Oliveira da Silva;Azita Dabiri;Bart De Schutter

{"title":"Integrating Reinforcement Learning and Model Predictive Control for Mixed- Logical Dynamical Systems","authors":"Caio Fabio Oliveira da Silva;Azita Dabiri;Bart De Schutter","doi":"10.1109/OJCSYS.2025.3601435","DOIUrl":null,"url":null,"abstract":"This work proposes an approach that integrates reinforcement learning (RL) and model predictive control (MPC) to solve finite-horizon optimal control problems in mixed-logical dynamical systems efficiently. Optimization-based control of such systems with discrete and continuous decision variables entails the online solution of mixed-integer linear programs, which suffer from the curse of dimensionality. In the proposed approach, by repeated interaction with a simulator of the system, a reinforcement learning agent is trained to provide a policy for the discrete decision variables. During online operation, the RL policy simplifies the online optimization problem of the MPC controller from a mixed-integer linear program to a linear program, significantly reducing the computation time. A fundamental contribution of this work is the definition of the decoupled Q-function, which plays a crucial role in making the learning problem tractable in a combinatorial action space. We motivate the use of recurrent neural networks to approximate the decoupled Q-function and show how they can be employed in a reinforcement learning setting. A microgrid system is used as an illustrative example where real-world data is used to demonstrate that the proposed method substantially reduces the maximum online computation time of MPC (up to <inline-formula><tex-math>$20\\times$</tex-math></inline-formula>) while maintaining high feasibility and average optimality gap lower than 1.1% .","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"4 ","pages":"316-331"},"PeriodicalIF":0.0000,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11134093","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of control systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11134093/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This work proposes an approach that integrates reinforcement learning (RL) and model predictive control (MPC) to solve finite-horizon optimal control problems in mixed-logical dynamical systems efficiently. Optimization-based control of such systems with discrete and continuous decision variables entails the online solution of mixed-integer linear programs, which suffer from the curse of dimensionality. In the proposed approach, by repeated interaction with a simulator of the system, a reinforcement learning agent is trained to provide a policy for the discrete decision variables. During online operation, the RL policy simplifies the online optimization problem of the MPC controller from a mixed-integer linear program to a linear program, significantly reducing the computation time. A fundamental contribution of this work is the definition of the decoupled Q-function, which plays a crucial role in making the learning problem tractable in a combinatorial action space. We motivate the use of recurrent neural networks to approximate the decoupled Q-function and show how they can be employed in a reinforcement learning setting. A microgrid system is used as an illustrative example where real-world data is used to demonstrate that the proposed method substantially reduces the maximum online computation time of MPC (up to

$20\times$

) while maintaining high feasibility and average optimality gap lower than 1.1% .

查看原文本刊更多论文

混合逻辑动力系统强化学习与模型预测控制的集成

本文提出了一种集成强化学习（RL）和模型预测控制（MPC）的方法来有效地解决混合逻辑动态系统中的有限视界最优控制问题。这类具有离散和连续决策变量的系统的优化控制需要在线求解受维数诅咒的混合整数线性规划。在提出的方法中，通过与系统模拟器的重复交互，训练强化学习代理为离散决策变量提供策略。在线运行时，RL策略将MPC控制器的在线优化问题从混合整数线性规划简化为线性规划，大大减少了计算时间。这项工作的一个基本贡献是解耦q函数的定义，它在使学习问题在组合动作空间中易于处理方面起着至关重要的作用。我们鼓励使用循环神经网络来近似解耦的q函数，并展示如何在强化学习设置中使用它们。以微电网系统为例，使用实际数据证明所提出的方法大大减少了MPC的最大在线计算时间（高达$20\times$），同时保持高可行性和平均最优性差距低于1.1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE open journal of control systems

自引率

0.00%

发文量