Jonas Klingebiel, Christoph Höges, Janik Horst, Oliver Nießen, Valerius Venzik, Christian Vering, Dirk Müller
{"title":"A self-optimizing defrost initiation controller for air-source heat pumps: Experimental validation of deep reinforcement learning","authors":"Jonas Klingebiel, Christoph Höges, Janik Horst, Oliver Nießen, Valerius Venzik, Christian Vering, Dirk Müller","doi":"10.1016/j.apenergy.2025.126400","DOIUrl":null,"url":null,"abstract":"<div><div>Air-source heat pumps (ASHPs) play a key role in sustainable heating, but their efficiency is significantly reduced by frost formation on the evaporator. The timing of defrost initiation is crucial to minimize energy losses, yet conventional demand-based defrosting (DBD) controllers rely on specialized sensors for frost detection and heuristic thresholds for defrost initiation, leading to increased system costs and suboptimal performance. This paper presents an experimental validation of a self-optimizing deep reinforcement learning (RL) controller. With our proposed implementation, RL determines defrost timing using standard temperature measurements and autonomously generates tailored control rules, overcoming the limitations of conventional DBD methods. The study consists of three case studies conducted on a hardware-in-the-loop test bench with a variable-speed ASHP. First, RL’s defrost timing accuracy is evaluated against experimentally pre-determined optima. Across five stationary test conditions, RL achieves near-optimal defrost initiations with maximum efficiency losses of at most 1.9 %. Second, RL is benchmarked against time-based (TBD) and demand-based defrost controllers for three typical days with varying ambient conditions. RL outperforms TBD by up to 7.1 % in <span><math><mi>S</mi><mi>C</mi><mi>O</mi><mi>P</mi></math></span> and 3.6 % in heat output. Compared to DBD, RL improves <span><math><mi>S</mi><mi>C</mi><mi>O</mi><mi>P</mi></math></span> by up to 9.1 % and heat output by 4.9 %. Finally, we assess RL’s ability to adapt its strategy through online learning. We emulate airflow blockage, a common soft-fault condition, caused by obstructions on the evaporator fins (e.g., leaves). RL adjusts its strategy to the changed environment and improves efficiency by 16.6 %. While the results are promising, limitations remain, requiring further research to validate RL in real-world ASHPs.</div></div>","PeriodicalId":246,"journal":{"name":"Applied Energy","volume":"398 ","pages":"Article 126400"},"PeriodicalIF":11.0000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Energy","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306261925011304","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENERGY & FUELS","Score":null,"Total":0}
引用次数: 0
Abstract
Air-source heat pumps (ASHPs) play a key role in sustainable heating, but their efficiency is significantly reduced by frost formation on the evaporator. The timing of defrost initiation is crucial to minimize energy losses, yet conventional demand-based defrosting (DBD) controllers rely on specialized sensors for frost detection and heuristic thresholds for defrost initiation, leading to increased system costs and suboptimal performance. This paper presents an experimental validation of a self-optimizing deep reinforcement learning (RL) controller. With our proposed implementation, RL determines defrost timing using standard temperature measurements and autonomously generates tailored control rules, overcoming the limitations of conventional DBD methods. The study consists of three case studies conducted on a hardware-in-the-loop test bench with a variable-speed ASHP. First, RL’s defrost timing accuracy is evaluated against experimentally pre-determined optima. Across five stationary test conditions, RL achieves near-optimal defrost initiations with maximum efficiency losses of at most 1.9 %. Second, RL is benchmarked against time-based (TBD) and demand-based defrost controllers for three typical days with varying ambient conditions. RL outperforms TBD by up to 7.1 % in and 3.6 % in heat output. Compared to DBD, RL improves by up to 9.1 % and heat output by 4.9 %. Finally, we assess RL’s ability to adapt its strategy through online learning. We emulate airflow blockage, a common soft-fault condition, caused by obstructions on the evaporator fins (e.g., leaves). RL adjusts its strategy to the changed environment and improves efficiency by 16.6 %. While the results are promising, limitations remain, requiring further research to validate RL in real-world ASHPs.
期刊介绍:
Applied Energy serves as a platform for sharing innovations, research, development, and demonstrations in energy conversion, conservation, and sustainable energy systems. The journal covers topics such as optimal energy resource use, environmental pollutant mitigation, and energy process analysis. It welcomes original papers, review articles, technical notes, and letters to the editor. Authors are encouraged to submit manuscripts that bridge the gap between research, development, and implementation. The journal addresses a wide spectrum of topics, including fossil and renewable energy technologies, energy economics, and environmental impacts. Applied Energy also explores modeling and forecasting, conservation strategies, and the social and economic implications of energy policies, including climate change mitigation. It is complemented by the open-access journal Advances in Applied Energy.