Distribution-based objectives for Markov Decision Processes

S. Akshay, B. Genest, Nikhil Vyas
{"title":"Distribution-based objectives for Markov Decision Processes","authors":"S. Akshay, B. Genest, Nikhil Vyas","doi":"10.1145/3209108.3209185","DOIUrl":null,"url":null,"abstract":"We consider distribution-based objectives for Markov Decision Processes (MDP). This class of objectives gives rise to an interesting trade-off between full and partial information. As in full observation, the strategy in the MDP can depend on the state of the system, but similar to partial information, the strategy needs to account for all the states at the same time. In this paper, we focus on two safety problems that arise naturally in this context, namely, existential and universal safety. Given an MDP A and a closed and convex polytope H of probability distributions over the states of A, the existential safety problem asks whether there exists some distribution Δ in H and a strategy of A, such that starting from Δ and repeatedly applying this strategy keeps the distribution forever in H. The universal safety problem asks whether for all distributions in H, there exists such a strategy of A which keeps the distribution forever in H. We prove that both problems are decidable, with tight complexity bounds: we show that existential safety is PTIME-complete, while universal safety is co-NP-complete. Further, we compare these results with existential and universal safety problems for Rabin's probabilistic finite-state automata (PFA), the subclass of Partially Observable MDPs which have zero observation. Compared to MDPs, strategies of PFAs are not state-dependent. In sharp contrast to the PTIME result, we show that existential safety for PFAs is undecidable, with H having closed and open boundaries. On the other hand, it turns out that the universal safety for PFAs is decidable in EXPTIME, with a co-NP lower bound. Finally, we show that an alternate representation of the input polytope allows us to improve the complexity of universal safety for MDPs and PFAs.","PeriodicalId":389131,"journal":{"name":"Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3209108.3209185","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

We consider distribution-based objectives for Markov Decision Processes (MDP). This class of objectives gives rise to an interesting trade-off between full and partial information. As in full observation, the strategy in the MDP can depend on the state of the system, but similar to partial information, the strategy needs to account for all the states at the same time. In this paper, we focus on two safety problems that arise naturally in this context, namely, existential and universal safety. Given an MDP A and a closed and convex polytope H of probability distributions over the states of A, the existential safety problem asks whether there exists some distribution Δ in H and a strategy of A, such that starting from Δ and repeatedly applying this strategy keeps the distribution forever in H. The universal safety problem asks whether for all distributions in H, there exists such a strategy of A which keeps the distribution forever in H. We prove that both problems are decidable, with tight complexity bounds: we show that existential safety is PTIME-complete, while universal safety is co-NP-complete. Further, we compare these results with existential and universal safety problems for Rabin's probabilistic finite-state automata (PFA), the subclass of Partially Observable MDPs which have zero observation. Compared to MDPs, strategies of PFAs are not state-dependent. In sharp contrast to the PTIME result, we show that existential safety for PFAs is undecidable, with H having closed and open boundaries. On the other hand, it turns out that the universal safety for PFAs is decidable in EXPTIME, with a co-NP lower bound. Finally, we show that an alternate representation of the input polytope allows us to improve the complexity of universal safety for MDPs and PFAs.
基于分布的马尔可夫决策过程目标
我们考虑基于分布的马尔可夫决策过程(MDP)目标。这类目标在完整信息和部分信息之间产生了一种有趣的权衡。在完全观察中,MDP中的策略可以依赖于系统的状态,但与部分信息类似,策略需要同时考虑所有状态。在本文中,我们重点讨论了在这种背景下自然产生的两个安全问题,即存在安全与普遍安全。给定一个MDP和一个封闭的凸多面体H(状态的概率分布,存在安全问题问是否存在一些分布ΔH和策略,这样从Δ永远和反复运用这种策略使分布在H .通用安全问题问是否所有分布在H,存在这样一个永远的策略,使分布在H .我们证明这两个问题是可决定的,我们证明了存在安全是ptime完全的,而普遍安全是共np完全的。此外,我们将这些结果与Rabin的概率有限状态自动机(PFA)的存在性和普遍安全性问题进行了比较,PFA是部分可观察mdp的子类,具有零观测值。与mdp相比,PFAs的策略不依赖于状态。与PTIME结果形成鲜明对比的是,我们表明pfa的存在安全性是不可确定的,H具有封闭和开放的边界。另一方面,证明了pfa的普遍安全性在EXPTIME上是可决定的,并具有一个co-NP下界。最后,我们证明了输入多面体的替代表示使我们能够提高mdp和PFAs的通用安全性的复杂性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信