信号时序逻辑约束下马尔可夫决策过程的成本最优控制

2021 Seventh Indian Control Conference (ICC) Pub Date : 2021-12-20 DOI:10.1109/ICC54714.2021.9703164

K. C. Kalagarla, R. Jain, P. Nuzzo

{"title":"信号时序逻辑约束下马尔可夫决策过程的成本最优控制","authors":"K. C. Kalagarla, R. Jain, P. Nuzzo","doi":"10.1109/ICC54714.2021.9703164","DOIUrl":null,"url":null,"abstract":"We present a method to find a cost-optimal policy for a given finite-horizon Markov decision process (MDP) with unknown transition probability, such that the probability of satisfying a given signal temporal logic specification is above a desired threshold. We propose an augmentation of the MDP state space to enable the expression of the STL objective as a reachability objective. In this augmented space, the optimal policy problem is re-formulated as a finite-horizon constrained Markov decision process (CMDP). We then develop a model-free reinforcement learning (RL) scheme to provide an approximately optimal policy for any general finite horizon CMDP problem. This scheme can make use of any off-the-shelf model-free RL algorithm and considers the general space of non-stationary randomized policies. Finally, we illustrate the applicability of our RL-based approach through two case studies.","PeriodicalId":382373,"journal":{"name":"2021 Seventh Indian Control Conference (ICC)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cost-Optimal Control of Markov Decision Processes Under Signal Temporal Logic Constraints\",\"authors\":\"K. C. Kalagarla, R. Jain, P. Nuzzo\",\"doi\":\"10.1109/ICC54714.2021.9703164\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a method to find a cost-optimal policy for a given finite-horizon Markov decision process (MDP) with unknown transition probability, such that the probability of satisfying a given signal temporal logic specification is above a desired threshold. We propose an augmentation of the MDP state space to enable the expression of the STL objective as a reachability objective. In this augmented space, the optimal policy problem is re-formulated as a finite-horizon constrained Markov decision process (CMDP). We then develop a model-free reinforcement learning (RL) scheme to provide an approximately optimal policy for any general finite horizon CMDP problem. This scheme can make use of any off-the-shelf model-free RL algorithm and considers the general space of non-stationary randomized policies. Finally, we illustrate the applicability of our RL-based approach through two case studies.\",\"PeriodicalId\":382373,\"journal\":{\"name\":\"2021 Seventh Indian Control Conference (ICC)\",\"volume\":\"78 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Seventh Indian Control Conference (ICC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICC54714.2021.9703164\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Seventh Indian Control Conference (ICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICC54714.2021.9703164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文提出了一种寻找具有未知转移概率的有限视界马尔可夫决策过程(MDP)的成本最优策略的方法，使得满足给定信号时间逻辑规范的概率大于期望阈值。我们提出了对MDP状态空间的扩充，以使STL目标的表达成为可达性目标。在这个增强空间中，将最优策略问题重新表述为有限视界约束马尔可夫决策过程(CMDP)。然后，我们开发了一种无模型强化学习(RL)方案，为任何一般有限水平CMDP问题提供了近似最优策略。该方案可以利用任何现成的无模型强化学习算法，并考虑非平稳随机策略的一般空间。最后，我们通过两个案例研究说明了基于强化学习的方法的适用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cost-Optimal Control of Markov Decision Processes Under Signal Temporal Logic Constraints

We present a method to find a cost-optimal policy for a given finite-horizon Markov decision process (MDP) with unknown transition probability, such that the probability of satisfying a given signal temporal logic specification is above a desired threshold. We propose an augmentation of the MDP state space to enable the expression of the STL objective as a reachability objective. In this augmented space, the optimal policy problem is re-formulated as a finite-horizon constrained Markov decision process (CMDP). We then develop a model-free reinforcement learning (RL) scheme to provide an approximately optimal policy for any general finite horizon CMDP problem. This scheme can make use of any off-the-shelf model-free RL algorithm and considers the general space of non-stationary randomized policies. Finally, we illustrate the applicability of our RL-based approach through two case studies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 Seventh Indian Control Conference (ICC)

自引率

0.00%

发文量