通过线性策略约束强化学习构建需求响应控制

IF 11 1区 工程技术 Q1 ENERGY & FUELS
Jerson Sanchez , Jie Cai
{"title":"通过线性策略约束强化学习构建需求响应控制","authors":"Jerson Sanchez ,&nbsp;Jie Cai","doi":"10.1016/j.apenergy.2025.126404","DOIUrl":null,"url":null,"abstract":"<div><div>Recent advancements in model-free control strategies, particularly reinforcement learning (RL), have enabled more practical and scalable solutions for controlling building energy systems. These strategies rely solely on data, eliminating the need for complex models of building dynamics during control decision making, the development of which is expensive involving significant engineering efforts. Conventional unconstrained RL controllers typically manage indoor comfort by incorporating a penalty for comfort violations into the reward function. This penalty function approach leads to control performance very sensitive to the penalty factor setting. A low comfort penalty factor can result in significant violations of comfort constraints while a high penalty factor tends to degrade economic performance. To address this issue, the present study presents a constrained RL-based control strategy for building demand response that explicitly learns a constraint value function from operation data. This study considers both linear mapping and deep neural networks for value and policy function approximation to evaluate their training stability and control performance in terms of economic return and constraint satisfaction. Simulation tests of the proposed strategy, as well as baseline model predictive controllers (MPC) and unconstrained RL strategies, demonstrate that the constrained RL approach could achieve utility cost savings of up to 16.1 %, comparable to those achieved with MPC baselines, while minimizing constraint violations. In contrast, the unconstrained RL controllers either lead to high utility costs or significant constraint violations, depending on the penalty factor settings. The constrained RL strategy with linear policy and value functions shows more stable training and offers 4 % additional cost savings with reduced constraint violations compared to constrained RL controllers with neural networks.</div></div>","PeriodicalId":246,"journal":{"name":"Applied Energy","volume":"398 ","pages":"Article 126404"},"PeriodicalIF":11.0000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Building demand response control through constrained reinforcement learning with linear policies\",\"authors\":\"Jerson Sanchez ,&nbsp;Jie Cai\",\"doi\":\"10.1016/j.apenergy.2025.126404\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Recent advancements in model-free control strategies, particularly reinforcement learning (RL), have enabled more practical and scalable solutions for controlling building energy systems. These strategies rely solely on data, eliminating the need for complex models of building dynamics during control decision making, the development of which is expensive involving significant engineering efforts. Conventional unconstrained RL controllers typically manage indoor comfort by incorporating a penalty for comfort violations into the reward function. This penalty function approach leads to control performance very sensitive to the penalty factor setting. A low comfort penalty factor can result in significant violations of comfort constraints while a high penalty factor tends to degrade economic performance. To address this issue, the present study presents a constrained RL-based control strategy for building demand response that explicitly learns a constraint value function from operation data. This study considers both linear mapping and deep neural networks for value and policy function approximation to evaluate their training stability and control performance in terms of economic return and constraint satisfaction. Simulation tests of the proposed strategy, as well as baseline model predictive controllers (MPC) and unconstrained RL strategies, demonstrate that the constrained RL approach could achieve utility cost savings of up to 16.1 %, comparable to those achieved with MPC baselines, while minimizing constraint violations. In contrast, the unconstrained RL controllers either lead to high utility costs or significant constraint violations, depending on the penalty factor settings. The constrained RL strategy with linear policy and value functions shows more stable training and offers 4 % additional cost savings with reduced constraint violations compared to constrained RL controllers with neural networks.</div></div>\",\"PeriodicalId\":246,\"journal\":{\"name\":\"Applied Energy\",\"volume\":\"398 \",\"pages\":\"Article 126404\"},\"PeriodicalIF\":11.0000,\"publicationDate\":\"2025-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Energy\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306261925011341\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENERGY & FUELS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Energy","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306261925011341","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENERGY & FUELS","Score":null,"Total":0}
引用次数: 0

摘要

无模型控制策略的最新进展,特别是强化学习(RL),为控制建筑能源系统提供了更实用和可扩展的解决方案。这些策略完全依赖于数据,消除了在控制决策过程中建立动态的复杂模型的需要,这些模型的开发是昂贵的,涉及大量的工程努力。传统的无约束RL控制器通常通过将违反舒适度的惩罚纳入奖励函数来管理室内舒适度。这种惩罚函数方法导致控制性能对惩罚因子设置非常敏感。较低的舒适性惩罚系数可能导致严重违反舒适性约束,而较高的惩罚系数往往会降低经济性能。为了解决这一问题,本研究提出了一种基于约束学习的控制策略,用于构建需求响应,该策略明确地从运行数据中学习约束值函数。本研究同时考虑线性映射和深度神经网络的价值和策略函数逼近,从经济回报和约束满足两方面评估它们的训练稳定性和控制性能。对所提出的策略,以及基线模型预测控制器(MPC)和无约束RL策略的仿真测试表明,与MPC基线相比,约束RL方法可以实现高达16.1% %的效用成本节约,同时最大限度地减少约束违规。相比之下,无约束RL控制器要么导致高效用成本,要么导致严重的约束违规,这取决于惩罚因素的设置。与具有神经网络的约束RL控制器相比,具有线性策略和值函数的约束RL策略显示出更稳定的训练,并且通过减少约束违规提供4 %的额外成本节约。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Building demand response control through constrained reinforcement learning with linear policies
Recent advancements in model-free control strategies, particularly reinforcement learning (RL), have enabled more practical and scalable solutions for controlling building energy systems. These strategies rely solely on data, eliminating the need for complex models of building dynamics during control decision making, the development of which is expensive involving significant engineering efforts. Conventional unconstrained RL controllers typically manage indoor comfort by incorporating a penalty for comfort violations into the reward function. This penalty function approach leads to control performance very sensitive to the penalty factor setting. A low comfort penalty factor can result in significant violations of comfort constraints while a high penalty factor tends to degrade economic performance. To address this issue, the present study presents a constrained RL-based control strategy for building demand response that explicitly learns a constraint value function from operation data. This study considers both linear mapping and deep neural networks for value and policy function approximation to evaluate their training stability and control performance in terms of economic return and constraint satisfaction. Simulation tests of the proposed strategy, as well as baseline model predictive controllers (MPC) and unconstrained RL strategies, demonstrate that the constrained RL approach could achieve utility cost savings of up to 16.1 %, comparable to those achieved with MPC baselines, while minimizing constraint violations. In contrast, the unconstrained RL controllers either lead to high utility costs or significant constraint violations, depending on the penalty factor settings. The constrained RL strategy with linear policy and value functions shows more stable training and offers 4 % additional cost savings with reduced constraint violations compared to constrained RL controllers with neural networks.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Energy
Applied Energy 工程技术-工程:化工
CiteScore
21.20
自引率
10.70%
发文量
1830
审稿时长
41 days
期刊介绍: Applied Energy serves as a platform for sharing innovations, research, development, and demonstrations in energy conversion, conservation, and sustainable energy systems. The journal covers topics such as optimal energy resource use, environmental pollutant mitigation, and energy process analysis. It welcomes original papers, review articles, technical notes, and letters to the editor. Authors are encouraged to submit manuscripts that bridge the gap between research, development, and implementation. The journal addresses a wide spectrum of topics, including fossil and renewable energy technologies, energy economics, and environmental impacts. Applied Energy also explores modeling and forecasting, conservation strategies, and the social and economic implications of energy policies, including climate change mitigation. It is complemented by the open-access journal Advances in Applied Energy.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信