Caio Fabio Oliveira da Silva;Azita Dabiri;Bart De Schutter
{"title":"混合逻辑动力系统强化学习与模型预测控制的集成","authors":"Caio Fabio Oliveira da Silva;Azita Dabiri;Bart De Schutter","doi":"10.1109/OJCSYS.2025.3601435","DOIUrl":null,"url":null,"abstract":"This work proposes an approach that integrates reinforcement learning (RL) and model predictive control (MPC) to solve finite-horizon optimal control problems in mixed-logical dynamical systems efficiently. Optimization-based control of such systems with discrete and continuous decision variables entails the online solution of mixed-integer linear programs, which suffer from the curse of dimensionality. In the proposed approach, by repeated interaction with a simulator of the system, a reinforcement learning agent is trained to provide a policy for the discrete decision variables. During online operation, the RL policy simplifies the online optimization problem of the MPC controller from a mixed-integer linear program to a linear program, significantly reducing the computation time. A fundamental contribution of this work is the definition of the decoupled Q-function, which plays a crucial role in making the learning problem tractable in a combinatorial action space. We motivate the use of recurrent neural networks to approximate the decoupled Q-function and show how they can be employed in a reinforcement learning setting. A microgrid system is used as an illustrative example where real-world data is used to demonstrate that the proposed method substantially reduces the maximum online computation time of MPC (up to <inline-formula><tex-math>$20\\times$</tex-math></inline-formula>) while maintaining high feasibility and average optimality gap lower than 1.1% .","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"4 ","pages":"316-331"},"PeriodicalIF":0.0000,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11134093","citationCount":"0","resultStr":"{\"title\":\"Integrating Reinforcement Learning and Model Predictive Control for Mixed- Logical Dynamical Systems\",\"authors\":\"Caio Fabio Oliveira da Silva;Azita Dabiri;Bart De Schutter\",\"doi\":\"10.1109/OJCSYS.2025.3601435\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This work proposes an approach that integrates reinforcement learning (RL) and model predictive control (MPC) to solve finite-horizon optimal control problems in mixed-logical dynamical systems efficiently. Optimization-based control of such systems with discrete and continuous decision variables entails the online solution of mixed-integer linear programs, which suffer from the curse of dimensionality. In the proposed approach, by repeated interaction with a simulator of the system, a reinforcement learning agent is trained to provide a policy for the discrete decision variables. During online operation, the RL policy simplifies the online optimization problem of the MPC controller from a mixed-integer linear program to a linear program, significantly reducing the computation time. A fundamental contribution of this work is the definition of the decoupled Q-function, which plays a crucial role in making the learning problem tractable in a combinatorial action space. We motivate the use of recurrent neural networks to approximate the decoupled Q-function and show how they can be employed in a reinforcement learning setting. A microgrid system is used as an illustrative example where real-world data is used to demonstrate that the proposed method substantially reduces the maximum online computation time of MPC (up to <inline-formula><tex-math>$20\\\\times$</tex-math></inline-formula>) while maintaining high feasibility and average optimality gap lower than 1.1% .\",\"PeriodicalId\":73299,\"journal\":{\"name\":\"IEEE open journal of control systems\",\"volume\":\"4 \",\"pages\":\"316-331\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11134093\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE open journal of control systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11134093/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of control systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11134093/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Integrating Reinforcement Learning and Model Predictive Control for Mixed- Logical Dynamical Systems
This work proposes an approach that integrates reinforcement learning (RL) and model predictive control (MPC) to solve finite-horizon optimal control problems in mixed-logical dynamical systems efficiently. Optimization-based control of such systems with discrete and continuous decision variables entails the online solution of mixed-integer linear programs, which suffer from the curse of dimensionality. In the proposed approach, by repeated interaction with a simulator of the system, a reinforcement learning agent is trained to provide a policy for the discrete decision variables. During online operation, the RL policy simplifies the online optimization problem of the MPC controller from a mixed-integer linear program to a linear program, significantly reducing the computation time. A fundamental contribution of this work is the definition of the decoupled Q-function, which plays a crucial role in making the learning problem tractable in a combinatorial action space. We motivate the use of recurrent neural networks to approximate the decoupled Q-function and show how they can be employed in a reinforcement learning setting. A microgrid system is used as an illustrative example where real-world data is used to demonstrate that the proposed method substantially reduces the maximum online computation time of MPC (up to $20\times$) while maintaining high feasibility and average optimality gap lower than 1.1% .