SAMUEL TESFAZGI;Leonhard Sprandl;Armin Lederer;Sandra Hirche
{"title":"Stable Inverse Reinforcement Learning: Policies From Control Lyapunov Landscapes","authors":"SAMUEL TESFAZGI;Leonhard Sprandl;Armin Lederer;Sandra Hirche","doi":"10.1109/OJCSYS.2024.3447464","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3447464","url":null,"abstract":"Learning from expert demonstrations to flexibly program an autonomous system with complex behaviors or to predict an agent's behavior is a powerful tool, especially in collaborative control settings. A common method to solve this problem is inverse reinforcement learning (IRL), where the observed agent, e.g., a human demonstrator, is assumed to behave according to the optimization of an intrinsic cost function that reflects its intent and informs its control actions. While the framework is expressive, the inferred control policies generally lack convergence guarantees, which are critical for safe deployment in real-world settings. We therefore propose a novel, stability-certified IRL approach by reformulating the cost function inference problem to learning control Lyapunov functions (CLF) from demonstrations data. By additionally exploiting closed-form expressions for associated control policies, we are able to efficiently search the space of CLFs by observing the attractor landscape of the induced dynamics. For the construction of the inverse optimal CLFs, we use a Sum of Squares and formulate a convex optimization problem. We present a theoretical analysis of the optimality properties provided by the CLF and evaluate our approach using both simulated and real-world, human-generated data.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"358-374"},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10643266","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142316493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning to Boost the Performance of Stable Nonlinear Systems","authors":"Luca Furieri;Clara Lucía Galimberti;Giancarlo Ferrari-Trecate","doi":"10.1109/OJCSYS.2024.3441768","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3441768","url":null,"abstract":"The growing scale and complexity of safety-critical control systems underscore the need to evolve current control architectures aiming for the unparalleled performances achievable through state-of-the-art optimization and machine learning algorithms. However, maintaining closed-loop stability while boosting the performance of nonlinear control systems using data-driven and deep-learning approaches stands as an important unsolved challenge. In this paper, we tackle the performance-boosting problem with closed-loop stability guarantees. Specifically, we establish a synergy between the Internal Model Control (IMC) principle for nonlinear systems and state-of-the-art unconstrained optimization approaches for learning stable dynamics. Our methods enable learning over specific classes of deep neural network performance-boosting controllers for stable nonlinear systems; crucially, we guarantee \u0000<inline-formula><tex-math>$mathcal {L}_{p}$</tex-math></inline-formula>\u0000 closed-loop stability even if optimization is halted prematurely. When the ground-truth dynamics are uncertain, we learn over robustly stabilizing control policies. Our robustness result is tight, in the sense that all stabilizing policies are recovered as the \u0000<inline-formula><tex-math>$mathcal {L}_{p}$</tex-math></inline-formula>\u0000 -gain of the model mismatch operator is reduced to zero. We discuss the implementation details of the proposed control schemes, including distributed ones, along with the corresponding optimization procedures, demonstrating the potential of freely shaping the cost functions through several numerical experiments.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"342-357"},"PeriodicalIF":0.0,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10633771","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142316492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributionally Robust Policy and Lyapunov-Certificate Learning","authors":"Kehan Long;Jorge Cortés;Nikolay Atanasov","doi":"10.1109/OJCSYS.2024.3440051","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3440051","url":null,"abstract":"This article presents novel methods for synthesizing distributionally robust stabilizing neural controllers and certificates for control systems under model uncertainty. A key challenge in designing controllers with stability guarantees for uncertain systems is the accurate determination of and adaptation to shifts in model parametric uncertainty during online deployment. We tackle this with a novel distributionally robust formulation of the Lyapunov derivative chance constraint ensuring a monotonic decrease of the Lyapunov certificate. To avoid the computational complexity involved in dealing with the space of probability measures, we identify a sufficient condition in the form of deterministic convex constraints that ensures the Lyapunov derivative constraint is satisfied. We integrate this condition into a loss function for training a neural network-based controller and show that, for the resulting closed-loop system, the global asymptotic stability of its equilibrium can be certified with high confidence, even with Out-of-Distribution (OoD) model uncertainties. To demonstrate the efficacy and efficiency of the proposed methodology, we compare it with an uncertainty-agnostic baseline approach and several reinforcement learning approaches in two control problems in simulation. Open-source implementations of the examples are available at \u0000<uri>https://github.com/KehanLong/DR_Stabilizing_Policy</uri>\u0000.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"375-388"},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10629071","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142376665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Babak Salamat;Sebastian-Sven Olzem;Gerhard Elsbacher;Andrea M. Tonello
{"title":"Global Multi-Phase Path Planning Through High-Level Reinforcement Learning","authors":"Babak Salamat;Sebastian-Sven Olzem;Gerhard Elsbacher;Andrea M. Tonello","doi":"10.1109/OJCSYS.2024.3435080","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3435080","url":null,"abstract":"In this paper, we introduce the \u0000<italic>Global Multi-Phase Path Planning</i>\u0000 (\u0000<monospace><inline-formula><tex-math>$GMP^{3}$</tex-math></inline-formula></monospace>\u0000) algorithm in planner problems, which computes fast and feasible trajectories in environments with obstacles, considering physical and kinematic constraints. Our approach utilizes a Markov Decision Process (MDP) framework and high-level reinforcement learning techniques to ensure trajectory smoothness, continuity, and compliance with constraints. Through extensive simulations, we demonstrate the algorithm's effectiveness and efficiency across various scenarios. We highlight existing path planning challenges, particularly in integrating dynamic adaptability and computational efficiency. The results validate our method's convergence guarantees using Lyapunov’s stability theorem and underscore its computational advantages.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"405-415"},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10613437","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142430772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Risk-Aware Stochastic MPC for Chance-Constrained Linear Systems","authors":"Pouria Tooranjipour;Bahare Kiumarsi;Hamidreza Modares","doi":"10.1109/OJCSYS.2024.3421372","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3421372","url":null,"abstract":"This paper presents a fully risk-aware model predictive control (MPC) framework for chance-constrained discrete-time linear control systems with process noise. Conditional value-at-risk (CVaR) as a popular coherent risk measure is incorporated in both the constraints and the cost function of the MPC framework. This allows the system to navigate the entire spectrum of risk assessments, from worst-case to risk-neutral scenarios, ensuring both constraint satisfaction and performance optimization in stochastic environments. The recursive feasibility and risk-aware exponential stability of the resulting risk-aware MPC are demonstrated through rigorous theoretical analysis by considering the disturbance feedback policy parameterization. In the end, two numerical examples are given to elucidate the efficacy of the proposed method.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"282-294"},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10578318","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141631005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging the Turnpike Effect for Mean Field Games Numerics","authors":"René A. Carmona;Claire Zeng","doi":"10.1109/OJCSYS.2024.3419642","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3419642","url":null,"abstract":"Recently, a deep-learning algorithm referred to as Deep Galerkin Method (DGM), has gained a lot of attention among those trying to solve numerically Mean Field Games with finite horizon, even if the performance seems to be decreasing significantly with increasing horizon. On the other hand, it has been proven that some specific classes of Mean Field Games enjoy some form of the turnpike property identified over seven decades ago by economists. The gist of this phenomenon is a proof that the solution of an optimal control problem over a long time interval spends most of its time near the stationary solution of the ergodic version of the corresponding infinite horizon optimization problem. After reviewing the implementation of DGM for finite horizon Mean Field Games, we introduce a “turnpike-accelerated” version that incorporates the turnpike estimates in the loss function to be optimized, and we perform a comparative numerical analysis to show the advantages of this accelerated version over the baseline DGM algorithm. We demonstrate on some of the Mean Field Game models with local-couplings known to have the turnpike property, as well as a new class of linear-quadratic models for which we derive explicit turnpike estimates.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"389-404"},"PeriodicalIF":0.0,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10572276","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142376852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Concurrent Learning of Control Policy and Unknown Safety Specifications in Reinforcement Learning","authors":"Lunet Yifru;Ali Baheri","doi":"10.1109/OJCSYS.2024.3418306","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3418306","url":null,"abstract":"Reinforcement learning (RL) has revolutionized decision-making across a wide range of domains over the past few decades. Yet, deploying RL policies in real-world scenarios presents the crucial challenge of ensuring safety. Traditional safe RL approaches have predominantly focused on incorporating predefined safety constraints into the policy learning process. However, this reliance on predefined safety constraints poses limitations in dynamic and unpredictable real-world settings where such constraints may not be available or sufficiently adaptable. Bridging this gap, we propose a novel approach that concurrently learns a safe RL control policy and identifies the unknown safety constraint parameters of a given environment. Initializing with a parametric signal temporal logic (pSTL) safety specification and a small initial labeled dataset, we frame the problem as a bilevel optimization task, intricately integrating constrained policy optimization, using a Lagrangian-variant of the twin delayed deep deterministic policy gradient (TD3) algorithm, with Bayesian optimization for optimizing parameters for the given pSTL safety specification. Through experimentation in comprehensive case studies, we validate the efficacy of this approach across varying forms of environmental constraints, consistently yielding safe RL policies with high returns. Furthermore, our findings indicate successful learning of STL safety constraint parameters, exhibiting a high degree of conformity with true environmental safety constraints. The performance of our model closely mirrors that of an ideal scenario that possesses complete prior knowledge of safety constraints, demonstrating its proficiency in accurately identifying environmental safety constraints and learning safe policies that adhere to those constraints. A Python implementation of the algorithm can be found at \u0000<uri>https://github.com/SAILRIT/Concurrent-Learning-of-Control-Policy-and-Unknown-Constraints-in-Reinforcement-Learning.git</uri>\u0000.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"266-281"},"PeriodicalIF":0.0,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10569078","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Solving Decision-Dependent Games by Learning From Feedback","authors":"Killian Wood;Ahmed S. Zamzam;Emiliano Dall'Anese","doi":"10.1109/OJCSYS.2024.3416768","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3416768","url":null,"abstract":"This paper tackles the problem of solving stochastic optimization problems with a decision-dependent distribution in the setting of stochastic strongly-monotone games and when the distributional dependence is unknown. A two-stage approach is proposed, which initially involves estimating the distributional dependence on decision variables, and subsequently optimizing over the estimated distributional map. The paper presents guarantees for the approximation of the cost of each agent. Furthermore, a stochastic gradient-based algorithm is developed and analyzed for finding the Nash equilibrium in a distributed fashion. Numerical simulations are provided for a novel electric vehicle charging market formulation using real-world data.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"295-309"},"PeriodicalIF":0.0,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10564130","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141964790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sorta Solving the OPF by Not Solving the OPF: DAE Control Theory and the Price of Realtime Regulation","authors":"Muhammad Nadeem;Ahmad F. Taha","doi":"10.1109/OJCSYS.2024.3414221","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3414221","url":null,"abstract":"This paper presents a new approach to approximate the AC optimal power flow (ACOPF). By eliminating the need to solve the ACOPF every few minutes, the paper showcases how a realtime feedback controller can be utilized in lieu of ACOPF and its variants. By \u0000<italic>i)</i>\u0000 forming the grid dynamics as a system of differential-algebraic equations (DAE) that naturally encode the non-convex OPF power flow constraints, \u0000<italic>ii)</i>\u0000 utilizing DAE-Lyapunov theory, and \u0000<italic>iii)</i>\u0000 designing a feedback controller that captures realtime uncertainty while being uncertainty-unaware, the presented approach demonstrates promises of obtaining solutions that are close to the OPF ones without needing to solve the OPF. The proposed controller responds in realtime to deviations in renewables generation and loads, guaranteeing improvements in system transient stability, while always yielding approximate solutions of the ACOPF with no constraint violations. As the studied approach herein yields slightly more expensive realtime generator controls, the corresponding price of realtime control and regulation is examined. Cost comparisons with the traditional ACOPF are also showcased—all via case studies on standard power networks.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"253-265"},"PeriodicalIF":0.0,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10556752","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141474874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regional PID Control of Switched Positive Systems With Multiple Equilibrium Points","authors":"Pei Zhang;Junfeng Zhang;Xuan Jia","doi":"10.1109/OJCSYS.2024.3391001","DOIUrl":"https://doi.org/10.1109/OJCSYS.2024.3391001","url":null,"abstract":"This paper investigates the regional control problem of switched positive systems with multiple equilibrium points. A proportional-integral-derivative controller is designed by combining the output, the error between the state and the equilibrium point, and the difference of output. A cone is introduced to design the final stable region. Two classes of copositive Lyapunov functions are constructed to achieve the stability and regional stability of subsystems and the whole systems, respectively. Then, a novel class of observers with multiple equilibrium points is proposed using a matrix decomposition approach. The observer-based proportional-integral-derivative control problem is thus solved and all states are driven to the designed cone region under the designed controller. All conditions are formulated in the form of linear programming. The novelties of this paper lie in that: (i) A proportional-integral-derivative control framework is introduced for the considered systems, (ii) Luenberger observer is developed for the observer with multiple equilibrium points, and (iii) Copositive Lyapunov functions and linear programming are employed for the analysis and design of controller and observer. Finally, the effectiveness of the proposed design is verified via two examples.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"3 ","pages":"190-201"},"PeriodicalIF":0.0,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10504945","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140818730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}