Deep reinforcement learning-based control of thermal energy storage for university classrooms: Co-Simulation with TRNSYS-Python and transfer learning across operational scenarios

IF 5.1 3区工程技术 Q2 ENERGY & FUELS

Energy Reports Pub Date : 2025-07-25 DOI:10.1016/j.egyr.2025.07.003

Giacomo Buscemi , Francesco Paolo Cuomo , Giuseppe Razzano , Francesco Liberato Cappiello , Silvio Brandi

{"title":"Deep reinforcement learning-based control of thermal energy storage for university classrooms: Co-Simulation with TRNSYS-Python and transfer learning across operational scenarios","authors":"Giacomo Buscemi , Francesco Paolo Cuomo , Giuseppe Razzano , Francesco Liberato Cappiello , Silvio Brandi","doi":"10.1016/j.egyr.2025.07.003","DOIUrl":null,"url":null,"abstract":"<div><div>Advanced controllers leveraging predictive and adaptive methods play a crucial role in optimizing building energy management and enhancing flexibility for maximizing the use of on-site renewable energy sources. This study investigates the application of Deep Reinforcement Learning (DRL) algorithms for optimizing the operation of an Air Handling Unit (AHU) system coupled with an inertial Thermal Energy Storage (TES) tank, serving educational spaces at the <em>Politecnico di Torino</em> campus. The TES supports both cooling in summer and heating in winter through the AHU’s thermal exchange coils, while proportional control mechanisms regulate downstream air conditions to ensure indoor comfort. The building’s electric energy demand is partially met by a 47 kW photovoltaic (PV) field and by power grid.</div><div>To evaluate the performance of DRL controllers, a co-simulation framework integrating TRNSYS 18 with Python was developed. This setup enables a comparative assessment between a baseline Rule-Based (RB) control approach, and a Soft Actor-Critic (SAC) Reinforcement Learning (RL) controller. The DRL controllers are trained to minimize heat pump (HP) energy consumption, strategically aligning operations with PV availability and electricity price fluctuations, while enforcing safety constraints to prevent temperature violations in the TES.</div><div>Simulation results demonstrate that the DRL approach achieved a 4.73 MWh reduction in annual primary energy consumption and a 3.2% decrease in operating costs compared to RB control, with electricity cost savings reaching 5.8%. To evaluate the controller’s generalizability, zero-shot transfer learning was employed to deploy the pre-trained DRL agent across different climatic conditions, tariff structures, and system fault scenarios without retraining. In comparison with the RB control, the transfer learning results demonstrated high adaptability, with electricity cost reductions ranging from 11.2% to 24.5% and gas consumption savings between 7.1% and 69.7% across diverse operating conditions.</div></div>","PeriodicalId":11798,"journal":{"name":"Energy Reports","volume":"14 ","pages":"Pages 1349-1367"},"PeriodicalIF":5.1000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Energy Reports","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352484725004172","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENERGY & FUELS","Score":null,"Total":0}

引用次数: 0

Abstract

Advanced controllers leveraging predictive and adaptive methods play a crucial role in optimizing building energy management and enhancing flexibility for maximizing the use of on-site renewable energy sources. This study investigates the application of Deep Reinforcement Learning (DRL) algorithms for optimizing the operation of an Air Handling Unit (AHU) system coupled with an inertial Thermal Energy Storage (TES) tank, serving educational spaces at the Politecnico di Torino campus. The TES supports both cooling in summer and heating in winter through the AHU’s thermal exchange coils, while proportional control mechanisms regulate downstream air conditions to ensure indoor comfort. The building’s electric energy demand is partially met by a 47 kW photovoltaic (PV) field and by power grid.

To evaluate the performance of DRL controllers, a co-simulation framework integrating TRNSYS 18 with Python was developed. This setup enables a comparative assessment between a baseline Rule-Based (RB) control approach, and a Soft Actor-Critic (SAC) Reinforcement Learning (RL) controller. The DRL controllers are trained to minimize heat pump (HP) energy consumption, strategically aligning operations with PV availability and electricity price fluctuations, while enforcing safety constraints to prevent temperature violations in the TES.

Simulation results demonstrate that the DRL approach achieved a 4.73 MWh reduction in annual primary energy consumption and a 3.2% decrease in operating costs compared to RB control, with electricity cost savings reaching 5.8%. To evaluate the controller’s generalizability, zero-shot transfer learning was employed to deploy the pre-trained DRL agent across different climatic conditions, tariff structures, and system fault scenarios without retraining. In comparison with the RB control, the transfer learning results demonstrated high adaptability, with electricity cost reductions ranging from 11.2% to 24.5% and gas consumption savings between 7.1% and 69.7% across diverse operating conditions.

查看原文本刊更多论文

基于深度强化学习的大学教室热能储存控制：与TRNSYS-Python的联合仿真和跨操作场景的迁移学习

利用预测和自适应方法的先进控制器在优化建筑能源管理和增强灵活性以最大限度地利用现场可再生能源方面发挥着至关重要的作用。本研究探讨了深度强化学习（DRL）算法的应用，以优化空气处理单元（AHU）系统与惯性热能储存（TES）罐的运行，该系统为都灵理工大学校园的教育空间提供服务。TES通过AHU的热交换盘管支持夏季制冷和冬季供暖，而比例控制机构调节下游空气条件，以确保室内舒适。该建筑的电力需求部分由47千瓦的光伏（PV）场和电网满足。为了评估DRL控制器的性能，开发了一个集成TRNSYS 18和Python的联合仿真框架。这种设置可以在基线基于规则（RB）的控制方法和软行为-批评（SAC）强化学习（RL）控制器之间进行比较评估。DRL控制器经过培训，可以最大限度地减少热泵（HP）的能耗，策略性地调整光伏可用性和电价波动，同时执行安全约束，防止TES温度违规。仿真结果表明，与RB控制相比，DRL方法实现了每年一次能源消耗减少4.73 MWh，运行成本降低3.2%，电力成本节约5.8%。为了评估控制器的可泛化性，采用零采样迁移学习将预训练的DRL agent部署在不同的气候条件、电价结构和系统故障场景中，而无需再训练。与RB控制相比，迁移学习结果显示出较高的适应性，在不同的运行条件下，电力成本降低了11.2%至24.5%，天然气消耗节省了7.1%至69.7%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Energy Reports Energy-General Energy

CiteScore

8.20

自引率

13.50%

发文量

2608

审稿时长

38 days

期刊介绍： Energy Reports is a new online multidisciplinary open access journal which focuses on publishing new research in the area of Energy with a rapid review and publication time. Energy Reports will be open to direct submissions and also to submissions from other Elsevier Energy journals, whose Editors have determined that Energy Reports would be a better fit.