{"title":"暖通空调控制强化学习中的奖励函数设计:热舒适与能效权衡的综述","authors":"Eisuke Togashi","doi":"10.1016/j.enbuild.2025.116439","DOIUrl":null,"url":null,"abstract":"<div><div>Reinforcement learning is increasingly applied to Heating, Ventilation, and Air Conditioning control to balance energy efficiency with occupant comfort. However, the design of the reward function, crucial for managing this trade-off, remains a relatively underexplored topic in existing review literature.</div><div>This paper addresses this gap through a systematic review of 79 studies published since 2020. We introduce a novel standardization methodology to enable a systematic comparison of the diverse reward formulations reported in the literature.</div><div>The analysis reveals two primary findings: (1) a substantial heterogeneity in reward function structures, with 68 unique designs identified, a factor that severely impedes research comparability; and (2) a prevalent reliance on empirically derived weighting coefficients that often lack a clear theoretical basis. Furthermore, this study identifies and quantifies the usage patterns of four common design techniques: occupancy consideration, comfort deadbands, error exponentiation, and acceptable limits.</div><div>Based on this comprehensive analysis, we propose a typical piecewise reward function structure that synthesizes common best practices and is grounded in established Heating, Ventilation, and Air Conditioning domain knowledge. This proposed structure is intended to serve as a foundational baseline, addressing the identified limitations and aiming to improve the comparability of future research in Reinforcement learning driven Heating, Ventilation, and Air Conditioning control.</div></div>","PeriodicalId":11641,"journal":{"name":"Energy and Buildings","volume":"348 ","pages":"Article 116439"},"PeriodicalIF":7.1000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reward function design in reinforcement learning for HVAC Control: A review of thermal comfort and energy efficiency Trade-offs\",\"authors\":\"Eisuke Togashi\",\"doi\":\"10.1016/j.enbuild.2025.116439\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Reinforcement learning is increasingly applied to Heating, Ventilation, and Air Conditioning control to balance energy efficiency with occupant comfort. However, the design of the reward function, crucial for managing this trade-off, remains a relatively underexplored topic in existing review literature.</div><div>This paper addresses this gap through a systematic review of 79 studies published since 2020. We introduce a novel standardization methodology to enable a systematic comparison of the diverse reward formulations reported in the literature.</div><div>The analysis reveals two primary findings: (1) a substantial heterogeneity in reward function structures, with 68 unique designs identified, a factor that severely impedes research comparability; and (2) a prevalent reliance on empirically derived weighting coefficients that often lack a clear theoretical basis. Furthermore, this study identifies and quantifies the usage patterns of four common design techniques: occupancy consideration, comfort deadbands, error exponentiation, and acceptable limits.</div><div>Based on this comprehensive analysis, we propose a typical piecewise reward function structure that synthesizes common best practices and is grounded in established Heating, Ventilation, and Air Conditioning domain knowledge. This proposed structure is intended to serve as a foundational baseline, addressing the identified limitations and aiming to improve the comparability of future research in Reinforcement learning driven Heating, Ventilation, and Air Conditioning control.</div></div>\",\"PeriodicalId\":11641,\"journal\":{\"name\":\"Energy and Buildings\",\"volume\":\"348 \",\"pages\":\"Article 116439\"},\"PeriodicalIF\":7.1000,\"publicationDate\":\"2025-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Energy and Buildings\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0378778825011697\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CONSTRUCTION & BUILDING TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Energy and Buildings","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378778825011697","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CONSTRUCTION & BUILDING TECHNOLOGY","Score":null,"Total":0}
Reward function design in reinforcement learning for HVAC Control: A review of thermal comfort and energy efficiency Trade-offs
Reinforcement learning is increasingly applied to Heating, Ventilation, and Air Conditioning control to balance energy efficiency with occupant comfort. However, the design of the reward function, crucial for managing this trade-off, remains a relatively underexplored topic in existing review literature.
This paper addresses this gap through a systematic review of 79 studies published since 2020. We introduce a novel standardization methodology to enable a systematic comparison of the diverse reward formulations reported in the literature.
The analysis reveals two primary findings: (1) a substantial heterogeneity in reward function structures, with 68 unique designs identified, a factor that severely impedes research comparability; and (2) a prevalent reliance on empirically derived weighting coefficients that often lack a clear theoretical basis. Furthermore, this study identifies and quantifies the usage patterns of four common design techniques: occupancy consideration, comfort deadbands, error exponentiation, and acceptable limits.
Based on this comprehensive analysis, we propose a typical piecewise reward function structure that synthesizes common best practices and is grounded in established Heating, Ventilation, and Air Conditioning domain knowledge. This proposed structure is intended to serve as a foundational baseline, addressing the identified limitations and aiming to improve the comparability of future research in Reinforcement learning driven Heating, Ventilation, and Air Conditioning control.
期刊介绍:
An international journal devoted to investigations of energy use and efficiency in buildings
Energy and Buildings is an international journal publishing articles with explicit links to energy use in buildings. The aim is to present new research results, and new proven practice aimed at reducing the energy needs of a building and improving indoor environment quality.