{"title":"Finite horizon analysis of infinite CTMDPs","authors":"P. Buchholz","doi":"10.1109/DSN.2012.6263929","DOIUrl":null,"url":null,"abstract":"Continuous Time Markov Decision Processes (CTMDPs) are used to describe optimization problems in many applications including system maintenance and control. Often one is interested in a control strategy or policy to optimize the gain of a system over a finite interval which is denoted as finite horizon. The computation of an ε-optimal policy, i.e., a policy that reaches the optimal gain up to some small ε, is often hindered by state space explosion which means that state spaces of realistic models can be very large or even infinite. The paper presents new algorithms to compute approximately optimal policies for CTMDPs with large or infinite state spaces. The new approach allows one to compute bounds on the achievable gain and a policy to reach the lower bound using a variant of uniformization on a finite subset of the state space. It is also shown how the approach can be applied to models with unbounded rewards or transition rates for which uniformization cannot be applied per se.","PeriodicalId":236791,"journal":{"name":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","volume":"292 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSN.2012.6263929","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Continuous Time Markov Decision Processes (CTMDPs) are used to describe optimization problems in many applications including system maintenance and control. Often one is interested in a control strategy or policy to optimize the gain of a system over a finite interval which is denoted as finite horizon. The computation of an ε-optimal policy, i.e., a policy that reaches the optimal gain up to some small ε, is often hindered by state space explosion which means that state spaces of realistic models can be very large or even infinite. The paper presents new algorithms to compute approximately optimal policies for CTMDPs with large or infinite state spaces. The new approach allows one to compute bounds on the achievable gain and a policy to reach the lower bound using a variant of uniformization on a finite subset of the state space. It is also shown how the approach can be applied to models with unbounded rewards or transition rates for which uniformization cannot be applied per se.