暖通空调控制强化学习中的奖励函数设计：热舒适与能效权衡的综述

IF 7.1 2区工程技术 Q1 CONSTRUCTION & BUILDING TECHNOLOGY

Energy and Buildings Pub Date : 2025-09-16 DOI:10.1016/j.enbuild.2025.116439

Eisuke Togashi

{"title":"暖通空调控制强化学习中的奖励函数设计：热舒适与能效权衡的综述","authors":"Eisuke Togashi","doi":"10.1016/j.enbuild.2025.116439","DOIUrl":null,"url":null,"abstract":"<div><div>Reinforcement learning is increasingly applied to Heating, Ventilation, and Air Conditioning control to balance energy efficiency with occupant comfort. However, the design of the reward function, crucial for managing this trade-off, remains a relatively underexplored topic in existing review literature.</div><div>This paper addresses this gap through a systematic review of 79 studies published since 2020. We introduce a novel standardization methodology to enable a systematic comparison of the diverse reward formulations reported in the literature.</div><div>The analysis reveals two primary findings: (1) a substantial heterogeneity in reward function structures, with 68 unique designs identified, a factor that severely impedes research comparability; and (2) a prevalent reliance on empirically derived weighting coefficients that often lack a clear theoretical basis. Furthermore, this study identifies and quantifies the usage patterns of four common design techniques: occupancy consideration, comfort deadbands, error exponentiation, and acceptable limits.</div><div>Based on this comprehensive analysis, we propose a typical piecewise reward function structure that synthesizes common best practices and is grounded in established Heating, Ventilation, and Air Conditioning domain knowledge. This proposed structure is intended to serve as a foundational baseline, addressing the identified limitations and aiming to improve the comparability of future research in Reinforcement learning driven Heating, Ventilation, and Air Conditioning control.</div></div>","PeriodicalId":11641,"journal":{"name":"Energy and Buildings","volume":"348 ","pages":"Article 116439"},"PeriodicalIF":7.1000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reward function design in reinforcement learning for HVAC Control: A review of thermal comfort and energy efficiency Trade-offs\",\"authors\":\"Eisuke Togashi\",\"doi\":\"10.1016/j.enbuild.2025.116439\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Reinforcement learning is increasingly applied to Heating, Ventilation, and Air Conditioning control to balance energy efficiency with occupant comfort. However, the design of the reward function, crucial for managing this trade-off, remains a relatively underexplored topic in existing review literature.</div><div>This paper addresses this gap through a systematic review of 79 studies published since 2020. We introduce a novel standardization methodology to enable a systematic comparison of the diverse reward formulations reported in the literature.</div><div>The analysis reveals two primary findings: (1) a substantial heterogeneity in reward function structures, with 68 unique designs identified, a factor that severely impedes research comparability; and (2) a prevalent reliance on empirically derived weighting coefficients that often lack a clear theoretical basis. Furthermore, this study identifies and quantifies the usage patterns of four common design techniques: occupancy consideration, comfort deadbands, error exponentiation, and acceptable limits.</div><div>Based on this comprehensive analysis, we propose a typical piecewise reward function structure that synthesizes common best practices and is grounded in established Heating, Ventilation, and Air Conditioning domain knowledge. This proposed structure is intended to serve as a foundational baseline, addressing the identified limitations and aiming to improve the comparability of future research in Reinforcement learning driven Heating, Ventilation, and Air Conditioning control.</div></div>\",\"PeriodicalId\":11641,\"journal\":{\"name\":\"Energy and Buildings\",\"volume\":\"348 \",\"pages\":\"Article 116439\"},\"PeriodicalIF\":7.1000,\"publicationDate\":\"2025-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Energy and Buildings\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0378778825011697\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CONSTRUCTION & BUILDING TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Energy and Buildings","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378778825011697","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

强化学习越来越多地应用于供暖、通风和空调控制，以平衡能源效率和乘员舒适度。然而，对于管理这种权衡至关重要的奖励功能的设计，在现有的评论文献中仍然是一个相对未被充分探讨的话题。本文通过对自2020年以来发表的79项研究的系统综述来解决这一差距。我们引入了一种新的标准化方法，以便对文献中报道的不同奖励公式进行系统比较。分析揭示了两个主要发现：(1)奖励功能结构存在显著异质性，共有68种独特的设计，这严重阻碍了研究的可比性；(2)普遍依赖经验推导的权重系数，往往缺乏明确的理论基础。此外，本研究确定并量化了四种常见设计技术的使用模式：占用考虑、舒适死区、误差指数和可接受限度。基于这一综合分析，我们提出了一个典型的分段奖励函数结构，该结构综合了常见的最佳实践，并以现有的采暖、通风和空调领域知识为基础。该提议的结构旨在作为基础基线，解决已确定的局限性，并旨在提高强化学习驱动的供暖，通风和空调控制的未来研究的可比性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Reward function design in reinforcement learning for HVAC Control: A review of thermal comfort and energy efficiency Trade-offs

Reinforcement learning is increasingly applied to Heating, Ventilation, and Air Conditioning control to balance energy efficiency with occupant comfort. However, the design of the reward function, crucial for managing this trade-off, remains a relatively underexplored topic in existing review literature.

This paper addresses this gap through a systematic review of 79 studies published since 2020. We introduce a novel standardization methodology to enable a systematic comparison of the diverse reward formulations reported in the literature.

The analysis reveals two primary findings: (1) a substantial heterogeneity in reward function structures, with 68 unique designs identified, a factor that severely impedes research comparability; and (2) a prevalent reliance on empirically derived weighting coefficients that often lack a clear theoretical basis. Furthermore, this study identifies and quantifies the usage patterns of four common design techniques: occupancy consideration, comfort deadbands, error exponentiation, and acceptable limits.

Based on this comprehensive analysis, we propose a typical piecewise reward function structure that synthesizes common best practices and is grounded in established Heating, Ventilation, and Air Conditioning domain knowledge. This proposed structure is intended to serve as a foundational baseline, addressing the identified limitations and aiming to improve the comparability of future research in Reinforcement learning driven Heating, Ventilation, and Air Conditioning control.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Energy and Buildings 工程技术-工程：土木

CiteScore

12.70

自引率

11.90%

发文量

863

审稿时长

38 days

期刊介绍： An international journal devoted to investigations of energy use and efficiency in buildings Energy and Buildings is an international journal publishing articles with explicit links to energy use in buildings. The aim is to present new research results, and new proven practice aimed at reducing the energy needs of a building and improving indoor environment quality.