{"title":"利用高阶控制障碍函数增强基于模型的强化学习的安全性","authors":"Tianyu Zhang, Jun Xu, Hongwei Zhang","doi":"10.1002/rnc.7888","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Due to the risk of taking unsafe actions in unknown environment dynamics, reinforcement learning (RL) algorithms with built-in safety guarantees to prevent unexpected accidents has received increasing attention. Introducing the control barrier function is a typical method for imposing safety constraints by constructing the forward invariant set, but this approach generally suffers from the conservativeness of the forward invariant set and difficulties in the training process. To overcome these challenges, this paper proposes a novel algorithm called model-based safe RL with high-order control barrier function (MBSRL-HOCBF). The concepts of generalized feasibility are introduced, including generalized feasible state and generalized feasible region, which can be applied to the modified HOCBF conditions during training, thus reducing the conservativeness of the forward invariant set of HOCBF while ensuring both safety and algorithm performance. Additionally, the safety indicator that explicitly identifies safe states without requiring knowing specific safety criteria is incorporated, and integrated into the common environment model. The integration combines the advantages of traditional model-based RL, including using model-generated data to speed up algorithm training, with the ability to identify the generalized feasibility of each state. Simulation results demonstrate that MBSRL-HOCBF not only achieves high returns but also guarantees safety across multiple control tasks.</p>\n </div>","PeriodicalId":50291,"journal":{"name":"International Journal of Robust and Nonlinear Control","volume":"35 9","pages":"3844-3855"},"PeriodicalIF":3.2000,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing Safety in Model-Based Reinforcement Learning With High-Order Control Barrier Functions\",\"authors\":\"Tianyu Zhang, Jun Xu, Hongwei Zhang\",\"doi\":\"10.1002/rnc.7888\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Due to the risk of taking unsafe actions in unknown environment dynamics, reinforcement learning (RL) algorithms with built-in safety guarantees to prevent unexpected accidents has received increasing attention. Introducing the control barrier function is a typical method for imposing safety constraints by constructing the forward invariant set, but this approach generally suffers from the conservativeness of the forward invariant set and difficulties in the training process. To overcome these challenges, this paper proposes a novel algorithm called model-based safe RL with high-order control barrier function (MBSRL-HOCBF). The concepts of generalized feasibility are introduced, including generalized feasible state and generalized feasible region, which can be applied to the modified HOCBF conditions during training, thus reducing the conservativeness of the forward invariant set of HOCBF while ensuring both safety and algorithm performance. Additionally, the safety indicator that explicitly identifies safe states without requiring knowing specific safety criteria is incorporated, and integrated into the common environment model. The integration combines the advantages of traditional model-based RL, including using model-generated data to speed up algorithm training, with the ability to identify the generalized feasibility of each state. Simulation results demonstrate that MBSRL-HOCBF not only achieves high returns but also guarantees safety across multiple control tasks.</p>\\n </div>\",\"PeriodicalId\":50291,\"journal\":{\"name\":\"International Journal of Robust and Nonlinear Control\",\"volume\":\"35 9\",\"pages\":\"3844-3855\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-02-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Robust and Nonlinear Control\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/rnc.7888\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Robust and Nonlinear Control","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/rnc.7888","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Enhancing Safety in Model-Based Reinforcement Learning With High-Order Control Barrier Functions
Due to the risk of taking unsafe actions in unknown environment dynamics, reinforcement learning (RL) algorithms with built-in safety guarantees to prevent unexpected accidents has received increasing attention. Introducing the control barrier function is a typical method for imposing safety constraints by constructing the forward invariant set, but this approach generally suffers from the conservativeness of the forward invariant set and difficulties in the training process. To overcome these challenges, this paper proposes a novel algorithm called model-based safe RL with high-order control barrier function (MBSRL-HOCBF). The concepts of generalized feasibility are introduced, including generalized feasible state and generalized feasible region, which can be applied to the modified HOCBF conditions during training, thus reducing the conservativeness of the forward invariant set of HOCBF while ensuring both safety and algorithm performance. Additionally, the safety indicator that explicitly identifies safe states without requiring knowing specific safety criteria is incorporated, and integrated into the common environment model. The integration combines the advantages of traditional model-based RL, including using model-generated data to speed up algorithm training, with the ability to identify the generalized feasibility of each state. Simulation results demonstrate that MBSRL-HOCBF not only achieves high returns but also guarantees safety across multiple control tasks.
期刊介绍:
Papers that do not include an element of robust or nonlinear control and estimation theory will not be considered by the journal, and all papers will be expected to include significant novel content. The focus of the journal is on model based control design approaches rather than heuristic or rule based methods. Papers on neural networks will have to be of exceptional novelty to be considered for the journal.