一类马尔可夫决策问题的近似最优策略及其在能量收集中的应用

2017 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt) Pub Date : 2017-05-15 DOI:10.23919/WIOPT.2017.7959931

Dor Shaviv, Ayfer Özgür

{"title":"一类马尔可夫决策问题的近似最优策略及其在能量收集中的应用","authors":"Dor Shaviv, Ayfer Özgür","doi":"10.23919/WIOPT.2017.7959931","DOIUrl":null,"url":null,"abstract":"We consider a general class of stochastic optimization problems, in which the state represents a certain level or amount which can be partly used and depleted, and subsequently filled by a random amount. This is motivated by energy harvesting applications, in which one manages the amount of energy in a battery, but is also related to inventory models and queuing models. We propose a simple policy that requires minimal knowledge of the distribution of the stochastic process involved, and show that it is a close approximation to the optimal solution with bounded guarantees. Specifically, under natural assumptions on the reward function, we provide constant multiplicative and additive gaps to optimality, which do not depend on the problem parameters. This allows us to obtain a simple formula for approximating the long-term expected average reward, which gives some insight on its qualitative behavior as a function of the maximal state and the distribution of the disturbance.","PeriodicalId":6630,"journal":{"name":"2017 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt)","volume":"182 1","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Approximately optimal policies for a class of Markov decision problems with applications to energy harvesting\",\"authors\":\"Dor Shaviv, Ayfer Özgür\",\"doi\":\"10.23919/WIOPT.2017.7959931\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider a general class of stochastic optimization problems, in which the state represents a certain level or amount which can be partly used and depleted, and subsequently filled by a random amount. This is motivated by energy harvesting applications, in which one manages the amount of energy in a battery, but is also related to inventory models and queuing models. We propose a simple policy that requires minimal knowledge of the distribution of the stochastic process involved, and show that it is a close approximation to the optimal solution with bounded guarantees. Specifically, under natural assumptions on the reward function, we provide constant multiplicative and additive gaps to optimality, which do not depend on the problem parameters. This allows us to obtain a simple formula for approximating the long-term expected average reward, which gives some insight on its qualitative behavior as a function of the maximal state and the distribution of the disturbance.\",\"PeriodicalId\":6630,\"journal\":{\"name\":\"2017 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt)\",\"volume\":\"182 1\",\"pages\":\"1-8\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/WIOPT.2017.7959931\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/WIOPT.2017.7959931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

我们考虑一类一般的随机优化问题，其中状态代表一定的水平或数量，可以部分使用和耗尽，随后由随机数量填充。这是由能量收集应用程序驱动的，其中管理电池中的能量量，但也与库存模型和排队模型有关。我们提出了一个简单的策略，它只需要对所涉及的随机过程的分布有最小的了解，并表明它是具有有界保证的最优解的近似。具体而言，在奖励函数的自然假设下，我们为最优性提供了常数乘性和加性间隙，而不依赖于问题参数。这使我们能够获得一个简单的公式来近似长期预期平均奖励，这使我们对其定性行为作为最大状态和干扰分布的函数有了一些了解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Approximately optimal policies for a class of Markov decision problems with applications to energy harvesting

We consider a general class of stochastic optimization problems, in which the state represents a certain level or amount which can be partly used and depleted, and subsequently filled by a random amount. This is motivated by energy harvesting applications, in which one manages the amount of energy in a battery, but is also related to inventory models and queuing models. We propose a simple policy that requires minimal knowledge of the distribution of the stochastic process involved, and show that it is a close approximation to the optimal solution with bounded guarantees. Specifically, under natural assumptions on the reward function, we provide constant multiplicative and additive gaps to optimality, which do not depend on the problem parameters. This allows us to obtain a simple formula for approximating the long-term expected average reward, which gives some insight on its qualitative behavior as a function of the maximal state and the distribution of the disturbance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt)

自引率

0.00%

发文量