基于强化学习的车辆热系统控制的安全性和时效性探索

IF 4.6 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Control Engineering Practice Pub Date : 2025-07-02 DOI:10.1016/j.conengprac.2025.106458

Prasoon Garg , Emilia Silvas , Frank Willems

{"title":"基于强化学习的车辆热系统控制的安全性和时效性探索","authors":"Prasoon Garg , Emilia Silvas , Frank Willems","doi":"10.1016/j.conengprac.2025.106458","DOIUrl":null,"url":null,"abstract":"<div><div>Reinforcement Learning has achieved huge success with various applications in controlled environments. However, limited application is seen in real-world applications due to challenges in guaranteeing safe system operation, required experiment time, and required a-priori system knowledge and models in existing methods. In this work, we propose a novel exploration method, which addresses simultaneously the challenges associated with safe and time-efficient exploration while dealing with system uncertainty. This method integrates a reciprocal Control Barrier Function and an on-line learned Gaussian Process Regression model. For safe system operation, we leverage the information from the reciprocal Control Barrier Function to limit the step size of the agent’s actions, when approaching the safety boundary. To make this exploration process time-efficient, we use the information gain metrics that are calculated using the estimation of the action-values by an on-line learned Gaussian Process Regression model to determine the direction of the agent’s actions. We demonstrate the potential of our exploration method in simulation and on a vehicle test-bench for efficiency-optimal calibration of a thermal management system for battery electric vehicles. To quantify the benefits in terms of safety, optimality, and time efficiency, we benchmark our exploration method with random and uncertainty-driven exploration methods in a simulation environment. For the studied test case, the proposed exploration method satisfies the safety constraint and it converges to within 1.25% of the true optimal action while requiring 28% and 18% lower experiment time compared to the random and uncertainty-driven exploration methods, respectively. For the proposed method, its performance is also demonstrated on a vehicle test bench. Experimental results show that the maximal thermal system efficiency is realized within 2% of the true optimum, while effectively dealing with the safety constraints.</div></div>","PeriodicalId":50615,"journal":{"name":"Control Engineering Practice","volume":"164 ","pages":"Article 106458"},"PeriodicalIF":4.6000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Safe and time-efficient exploration in Reinforcement Learning-based control of a vehicle thermal systems\",\"authors\":\"Prasoon Garg , Emilia Silvas , Frank Willems\",\"doi\":\"10.1016/j.conengprac.2025.106458\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Reinforcement Learning has achieved huge success with various applications in controlled environments. However, limited application is seen in real-world applications due to challenges in guaranteeing safe system operation, required experiment time, and required a-priori system knowledge and models in existing methods. In this work, we propose a novel exploration method, which addresses simultaneously the challenges associated with safe and time-efficient exploration while dealing with system uncertainty. This method integrates a reciprocal Control Barrier Function and an on-line learned Gaussian Process Regression model. For safe system operation, we leverage the information from the reciprocal Control Barrier Function to limit the step size of the agent’s actions, when approaching the safety boundary. To make this exploration process time-efficient, we use the information gain metrics that are calculated using the estimation of the action-values by an on-line learned Gaussian Process Regression model to determine the direction of the agent’s actions. We demonstrate the potential of our exploration method in simulation and on a vehicle test-bench for efficiency-optimal calibration of a thermal management system for battery electric vehicles. To quantify the benefits in terms of safety, optimality, and time efficiency, we benchmark our exploration method with random and uncertainty-driven exploration methods in a simulation environment. For the studied test case, the proposed exploration method satisfies the safety constraint and it converges to within 1.25% of the true optimal action while requiring 28% and 18% lower experiment time compared to the random and uncertainty-driven exploration methods, respectively. For the proposed method, its performance is also demonstrated on a vehicle test bench. Experimental results show that the maximal thermal system efficiency is realized within 2% of the true optimum, while effectively dealing with the safety constraints.</div></div>\",\"PeriodicalId\":50615,\"journal\":{\"name\":\"Control Engineering Practice\",\"volume\":\"164 \",\"pages\":\"Article 106458\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Control Engineering Practice\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0967066125002205\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Control Engineering Practice","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0967066125002205","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

强化学习在受控环境中的各种应用取得了巨大的成功。然而，由于现有方法在保证系统安全运行、需要实验时间以及需要先验系统知识和模型等方面存在挑战，在实际应用中应用有限。在这项工作中，我们提出了一种新的探索方法，该方法在处理系统不确定性的同时解决了与安全和时间效率探索相关的挑战。该方法结合了互易控制势垒函数和在线学习高斯过程回归模型。为了使系统安全运行，我们利用来自互易控制障碍函数的信息来限制智能体在接近安全边界时的动作步长。为了使这个探索过程省时，我们使用了信息增益指标，该指标是通过在线学习高斯过程回归模型对动作值的估计来计算的，以确定代理的动作方向。我们在模拟和车辆试验台上展示了我们的探索方法的潜力，用于电池电动汽车热管理系统的效率优化校准。为了量化安全性、最优性和时间效率方面的效益，我们在模拟环境中使用随机和不确定性驱动的勘探方法对我们的勘探方法进行了基准测试。对于所研究的测试案例，所提出的勘探方法满足安全约束，收敛到真实最优行为的1.25%以内，与随机和不确定性驱动的勘探方法相比，所需要的实验时间分别减少28%和18%。最后，在汽车试验台上验证了该方法的有效性。实验结果表明，在有效处理安全约束的前提下，在真实最优值的2%范围内实现了最大热系统效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Safe and time-efficient exploration in Reinforcement Learning-based control of a vehicle thermal systems

Reinforcement Learning has achieved huge success with various applications in controlled environments. However, limited application is seen in real-world applications due to challenges in guaranteeing safe system operation, required experiment time, and required a-priori system knowledge and models in existing methods. In this work, we propose a novel exploration method, which addresses simultaneously the challenges associated with safe and time-efficient exploration while dealing with system uncertainty. This method integrates a reciprocal Control Barrier Function and an on-line learned Gaussian Process Regression model. For safe system operation, we leverage the information from the reciprocal Control Barrier Function to limit the step size of the agent’s actions, when approaching the safety boundary. To make this exploration process time-efficient, we use the information gain metrics that are calculated using the estimation of the action-values by an on-line learned Gaussian Process Regression model to determine the direction of the agent’s actions. We demonstrate the potential of our exploration method in simulation and on a vehicle test-bench for efficiency-optimal calibration of a thermal management system for battery electric vehicles. To quantify the benefits in terms of safety, optimality, and time efficiency, we benchmark our exploration method with random and uncertainty-driven exploration methods in a simulation environment. For the studied test case, the proposed exploration method satisfies the safety constraint and it converges to within 1.25% of the true optimal action while requiring 28% and 18% lower experiment time compared to the random and uncertainty-driven exploration methods, respectively. For the proposed method, its performance is also demonstrated on a vehicle test bench. Experimental results show that the maximal thermal system efficiency is realized within 2% of the true optimum, while effectively dealing with the safety constraints.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Control Engineering Practice 工程技术-工程：电子与电气

CiteScore

9.20

自引率

12.20%

发文量

183

审稿时长

44 days

期刊介绍： Control Engineering Practice strives to meet the needs of industrial practitioners and industrially related academics and researchers. It publishes papers which illustrate the direct application of control theory and its supporting tools in all possible areas of automation. As a result, the journal only contains papers which can be considered to have made significant contributions to the application of advanced control techniques. It is normally expected that practical results should be included, but where simulation only studies are available, it is necessary to demonstrate that the simulation model is representative of a genuine application. Strictly theoretical papers will find a more appropriate home in Control Engineering Practice''s sister publication, Automatica. It is also expected that papers are innovative with respect to the state of the art and are sufficiently detailed for a reader to be able to duplicate the main results of the paper (supplementary material, including datasets, tables, code and any relevant interactive material can be made available and downloaded from the website). The benefits of the presented methods must be made very clear and the new techniques must be compared and contrasted with results obtained using existing methods. Moreover, a thorough analysis of failures that may happen in the design process and implementation can also be part of the paper. The scope of Control Engineering Practice matches the activities of IFAC. Papers demonstrating the contribution of automation and control in improving the performance, quality, productivity, sustainability, resource and energy efficiency, and the manageability of systems and processes for the benefit of mankind and are relevant to industrial practitioners are most welcome.