一类马尔可夫决策问题的近似最优策略及其在能量收集中的应用

Dor Shaviv, Ayfer Özgür
{"title":"一类马尔可夫决策问题的近似最优策略及其在能量收集中的应用","authors":"Dor Shaviv, Ayfer Özgür","doi":"10.23919/WIOPT.2017.7959931","DOIUrl":null,"url":null,"abstract":"We consider a general class of stochastic optimization problems, in which the state represents a certain level or amount which can be partly used and depleted, and subsequently filled by a random amount. This is motivated by energy harvesting applications, in which one manages the amount of energy in a battery, but is also related to inventory models and queuing models. We propose a simple policy that requires minimal knowledge of the distribution of the stochastic process involved, and show that it is a close approximation to the optimal solution with bounded guarantees. Specifically, under natural assumptions on the reward function, we provide constant multiplicative and additive gaps to optimality, which do not depend on the problem parameters. This allows us to obtain a simple formula for approximating the long-term expected average reward, which gives some insight on its qualitative behavior as a function of the maximal state and the distribution of the disturbance.","PeriodicalId":6630,"journal":{"name":"2017 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt)","volume":"182 1","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Approximately optimal policies for a class of Markov decision problems with applications to energy harvesting\",\"authors\":\"Dor Shaviv, Ayfer Özgür\",\"doi\":\"10.23919/WIOPT.2017.7959931\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider a general class of stochastic optimization problems, in which the state represents a certain level or amount which can be partly used and depleted, and subsequently filled by a random amount. This is motivated by energy harvesting applications, in which one manages the amount of energy in a battery, but is also related to inventory models and queuing models. We propose a simple policy that requires minimal knowledge of the distribution of the stochastic process involved, and show that it is a close approximation to the optimal solution with bounded guarantees. Specifically, under natural assumptions on the reward function, we provide constant multiplicative and additive gaps to optimality, which do not depend on the problem parameters. This allows us to obtain a simple formula for approximating the long-term expected average reward, which gives some insight on its qualitative behavior as a function of the maximal state and the distribution of the disturbance.\",\"PeriodicalId\":6630,\"journal\":{\"name\":\"2017 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt)\",\"volume\":\"182 1\",\"pages\":\"1-8\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/WIOPT.2017.7959931\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/WIOPT.2017.7959931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

我们考虑一类一般的随机优化问题,其中状态代表一定的水平或数量,可以部分使用和耗尽,随后由随机数量填充。这是由能量收集应用程序驱动的,其中管理电池中的能量量,但也与库存模型和排队模型有关。我们提出了一个简单的策略,它只需要对所涉及的随机过程的分布有最小的了解,并表明它是具有有界保证的最优解的近似。具体而言,在奖励函数的自然假设下,我们为最优性提供了常数乘性和加性间隙,而不依赖于问题参数。这使我们能够获得一个简单的公式来近似长期预期平均奖励,这使我们对其定性行为作为最大状态和干扰分布的函数有了一些了解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Approximately optimal policies for a class of Markov decision problems with applications to energy harvesting
We consider a general class of stochastic optimization problems, in which the state represents a certain level or amount which can be partly used and depleted, and subsequently filled by a random amount. This is motivated by energy harvesting applications, in which one manages the amount of energy in a battery, but is also related to inventory models and queuing models. We propose a simple policy that requires minimal knowledge of the distribution of the stochastic process involved, and show that it is a close approximation to the optimal solution with bounded guarantees. Specifically, under natural assumptions on the reward function, we provide constant multiplicative and additive gaps to optimality, which do not depend on the problem parameters. This allows us to obtain a simple formula for approximating the long-term expected average reward, which gives some insight on its qualitative behavior as a function of the maximal state and the distribution of the disturbance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信