{"title":"一类马尔可夫决策问题的近似最优策略及其在能量收集中的应用","authors":"Dor Shaviv, Ayfer Özgür","doi":"10.23919/WIOPT.2017.7959931","DOIUrl":null,"url":null,"abstract":"We consider a general class of stochastic optimization problems, in which the state represents a certain level or amount which can be partly used and depleted, and subsequently filled by a random amount. This is motivated by energy harvesting applications, in which one manages the amount of energy in a battery, but is also related to inventory models and queuing models. We propose a simple policy that requires minimal knowledge of the distribution of the stochastic process involved, and show that it is a close approximation to the optimal solution with bounded guarantees. Specifically, under natural assumptions on the reward function, we provide constant multiplicative and additive gaps to optimality, which do not depend on the problem parameters. This allows us to obtain a simple formula for approximating the long-term expected average reward, which gives some insight on its qualitative behavior as a function of the maximal state and the distribution of the disturbance.","PeriodicalId":6630,"journal":{"name":"2017 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt)","volume":"182 1","pages":"1-8"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Approximately optimal policies for a class of Markov decision problems with applications to energy harvesting\",\"authors\":\"Dor Shaviv, Ayfer Özgür\",\"doi\":\"10.23919/WIOPT.2017.7959931\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider a general class of stochastic optimization problems, in which the state represents a certain level or amount which can be partly used and depleted, and subsequently filled by a random amount. This is motivated by energy harvesting applications, in which one manages the amount of energy in a battery, but is also related to inventory models and queuing models. We propose a simple policy that requires minimal knowledge of the distribution of the stochastic process involved, and show that it is a close approximation to the optimal solution with bounded guarantees. Specifically, under natural assumptions on the reward function, we provide constant multiplicative and additive gaps to optimality, which do not depend on the problem parameters. This allows us to obtain a simple formula for approximating the long-term expected average reward, which gives some insight on its qualitative behavior as a function of the maximal state and the distribution of the disturbance.\",\"PeriodicalId\":6630,\"journal\":{\"name\":\"2017 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt)\",\"volume\":\"182 1\",\"pages\":\"1-8\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/WIOPT.2017.7959931\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 15th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/WIOPT.2017.7959931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Approximately optimal policies for a class of Markov decision problems with applications to energy harvesting
We consider a general class of stochastic optimization problems, in which the state represents a certain level or amount which can be partly used and depleted, and subsequently filled by a random amount. This is motivated by energy harvesting applications, in which one manages the amount of energy in a battery, but is also related to inventory models and queuing models. We propose a simple policy that requires minimal knowledge of the distribution of the stochastic process involved, and show that it is a close approximation to the optimal solution with bounded guarantees. Specifically, under natural assumptions on the reward function, we provide constant multiplicative and additive gaps to optimality, which do not depend on the problem parameters. This allows us to obtain a simple formula for approximating the long-term expected average reward, which gives some insight on its qualitative behavior as a function of the maximal state and the distribution of the disturbance.