Online Learning for Personalized Room-Level Thermal Control: A Multi-Armed Bandit Framework

Parisa Mansourifard, F. Jazizadeh, B. Krishnamachari, B. Becerik-Gerber
{"title":"Online Learning for Personalized Room-Level Thermal Control: A Multi-Armed Bandit Framework","authors":"Parisa Mansourifard, F. Jazizadeh, B. Krishnamachari, B. Becerik-Gerber","doi":"10.1145/2528282.2528296","DOIUrl":null,"url":null,"abstract":"We consider the problem of automatically learning the optimal thermal control in a room in order to maximize the expected average satisfaction among occupants providing stochastic feedback on their comfort through a participatory sensing application. Not assuming any prior knowledge or modeling of user comfort, we first apply the classic UCB1 online learning policy for multi-armed bandits (MAB), that combines exploration (testing out certain temperatures to understand better the user preferences) with exploitation (spending more time setting temperatures that maximize average-satisfaction) for the case when the total occupancy is constant. When occupancy is time-varying, the number of possible scenarios (i.e., which particular set of occupants are present in the room) becomes exponentially large, posing a combinatorial challenge. However, we show that LLR, a recently-developed combinatorial MAB online learning algorithm that requires recording and computation of only a polynomial number of quantities can be applied to this setting, yielding a regret (cumulative gap in average satisfaction with respect to a distribution aware genie) that grows only polynomially in the number of users, and logarithmically with time. This in turn indicates that difference in unit-time satisfaction obtained by the learning policy compared to the optimal tends to 0. We quantify the performance of these online learning algorithms using real data collected from users of a participatory sensing iPhone app in a multi-occupancy room in an office building in Southern California.","PeriodicalId":184274,"journal":{"name":"Proceedings of the 5th ACM Workshop on Embedded Systems For Energy-Efficient Buildings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th ACM Workshop on Embedded Systems For Energy-Efficient Buildings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2528282.2528296","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

We consider the problem of automatically learning the optimal thermal control in a room in order to maximize the expected average satisfaction among occupants providing stochastic feedback on their comfort through a participatory sensing application. Not assuming any prior knowledge or modeling of user comfort, we first apply the classic UCB1 online learning policy for multi-armed bandits (MAB), that combines exploration (testing out certain temperatures to understand better the user preferences) with exploitation (spending more time setting temperatures that maximize average-satisfaction) for the case when the total occupancy is constant. When occupancy is time-varying, the number of possible scenarios (i.e., which particular set of occupants are present in the room) becomes exponentially large, posing a combinatorial challenge. However, we show that LLR, a recently-developed combinatorial MAB online learning algorithm that requires recording and computation of only a polynomial number of quantities can be applied to this setting, yielding a regret (cumulative gap in average satisfaction with respect to a distribution aware genie) that grows only polynomially in the number of users, and logarithmically with time. This in turn indicates that difference in unit-time satisfaction obtained by the learning policy compared to the optimal tends to 0. We quantify the performance of these online learning algorithms using real data collected from users of a participatory sensing iPhone app in a multi-occupancy room in an office building in Southern California.
个性化房间级热控制的在线学习:一个多武装强盗框架
我们考虑了自动学习房间内最优热控制的问题,以最大限度地提高居住者的预期平均满意度,通过参与式传感应用提供关于他们舒适度的随机反馈。在不假设任何先验知识或用户舒适度建模的情况下,我们首先将经典的UCB1在线学习策略应用于多武装强盗(MAB),该策略将探索(测试特定温度以更好地了解用户偏好)与开发(花费更多时间设置温度以最大化平均满意度)相结合,用于总入住率恒定的情况。当占用率随时间变化时,可能场景的数量(即,房间中存在哪一组特定的占用者)会呈指数级增长,从而构成组合挑战。然而,我们表明,LLR,一种最近开发的组合MAB在线学习算法,只需要记录和计算一个多项式数量的量,可以应用于这种设置,产生遗憾(相对于分布感知genie的平均满意度的累积差距),它只在用户数量上多项式增长,并随着时间呈对数增长。这反过来表明,与最优策略相比,学习策略获得的单位时间满意度的差异趋于0。我们量化了这些在线学习算法的性能,使用的是在南加州一栋办公楼的多人使用房间里,从参与式传感iPhone应用程序的用户那里收集到的真实数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信