具有连续动作空间的区间马尔可夫决策过程

Giannis Delimpaltadakis, Morteza Lahijanian, M. Mazo, L. Laurenti
{"title":"具有连续动作空间的区间马尔可夫决策过程","authors":"Giannis Delimpaltadakis, Morteza Lahijanian, M. Mazo, L. Laurenti","doi":"10.1145/3575870.3587117","DOIUrl":null,"url":null,"abstract":"Interval Markov Decision Processes (IMDPs) are finite-state uncertain Markov models, where the transition probabilities belong to intervals. Recently, there has been a surge of research on employing IMDPs as abstractions of stochastic systems for control synthesis. However, due to the absence of algorithms for synthesis over IMDPs with continuous action-spaces, the action-space is assumed discrete a-priori, which is a restrictive assumption for many applications. Motivated by this, we introduce continuous-action IMDPs (caIMDPs), where the bounds on transition probabilities are functions of the action variables, and study value iteration for maximizing expected cumulative rewards. Specifically, we decompose the max-min problem associated to value iteration to |𝒬| max problems, where |𝒬| is the number of states of the caIMDP. Then, exploiting the simple form of these max problems, we identify cases where value iteration over caIMDPs can be solved efficiently (e.g., with linear or convex programming). We also gain other interesting insights: e.g., in certain cases where the action set 𝒜 is a polytope, synthesis over a discrete-action IMDP, where the actions are the vertices of 𝒜, is sufficient for optimality. We demonstrate our results on a numerical example. Finally, we include a short discussion on employing caIMDPs as abstractions for control synthesis.","PeriodicalId":426801,"journal":{"name":"Proceedings of the 26th ACM International Conference on Hybrid Systems: Computation and Control","volume":"356 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Interval Markov Decision Processes with Continuous Action-Spaces\",\"authors\":\"Giannis Delimpaltadakis, Morteza Lahijanian, M. Mazo, L. Laurenti\",\"doi\":\"10.1145/3575870.3587117\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Interval Markov Decision Processes (IMDPs) are finite-state uncertain Markov models, where the transition probabilities belong to intervals. Recently, there has been a surge of research on employing IMDPs as abstractions of stochastic systems for control synthesis. However, due to the absence of algorithms for synthesis over IMDPs with continuous action-spaces, the action-space is assumed discrete a-priori, which is a restrictive assumption for many applications. Motivated by this, we introduce continuous-action IMDPs (caIMDPs), where the bounds on transition probabilities are functions of the action variables, and study value iteration for maximizing expected cumulative rewards. Specifically, we decompose the max-min problem associated to value iteration to |𝒬| max problems, where |𝒬| is the number of states of the caIMDP. Then, exploiting the simple form of these max problems, we identify cases where value iteration over caIMDPs can be solved efficiently (e.g., with linear or convex programming). We also gain other interesting insights: e.g., in certain cases where the action set 𝒜 is a polytope, synthesis over a discrete-action IMDP, where the actions are the vertices of 𝒜, is sufficient for optimality. We demonstrate our results on a numerical example. Finally, we include a short discussion on employing caIMDPs as abstractions for control synthesis.\",\"PeriodicalId\":426801,\"journal\":{\"name\":\"Proceedings of the 26th ACM International Conference on Hybrid Systems: Computation and Control\",\"volume\":\"356 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 26th ACM International Conference on Hybrid Systems: Computation and Control\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3575870.3587117\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th ACM International Conference on Hybrid Systems: Computation and Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3575870.3587117","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

区间马尔可夫决策过程是有限状态不确定马尔可夫模型,其转移概率属于区间。近年来,将imdp作为随机系统的抽象来进行控制综合的研究激增。然而,由于缺乏对具有连续动作空间的imdp进行综合的算法,动作空间被假定为先验离散,这对许多应用来说是一个限制性的假设。在此基础上,我们引入了连续行动imdp (caimdp),其中转移概率的边界是行动变量的函数,并研究了最大化期望累积奖励的值迭代。具体来说,我们将与值迭代相关的最大最小问题分解为|𝒬| max问题,其中|𝒬|是caIMDP的状态数。然后,利用这些最大问题的简单形式,我们确定了caimdp上的值迭代可以有效解决的情况(例如,使用线性或凸规划)。我们还获得了其他有趣的见解:例如,在某些情况下,当动作集为多角形时,在离散动作IMDP上的综合,其中动作是顶点的值,对于最优性是足够的。我们用一个数值例子来证明我们的结果。最后,我们对使用caimdp作为控制综合的抽象进行了简短的讨论。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Interval Markov Decision Processes with Continuous Action-Spaces
Interval Markov Decision Processes (IMDPs) are finite-state uncertain Markov models, where the transition probabilities belong to intervals. Recently, there has been a surge of research on employing IMDPs as abstractions of stochastic systems for control synthesis. However, due to the absence of algorithms for synthesis over IMDPs with continuous action-spaces, the action-space is assumed discrete a-priori, which is a restrictive assumption for many applications. Motivated by this, we introduce continuous-action IMDPs (caIMDPs), where the bounds on transition probabilities are functions of the action variables, and study value iteration for maximizing expected cumulative rewards. Specifically, we decompose the max-min problem associated to value iteration to |𝒬| max problems, where |𝒬| is the number of states of the caIMDP. Then, exploiting the simple form of these max problems, we identify cases where value iteration over caIMDPs can be solved efficiently (e.g., with linear or convex programming). We also gain other interesting insights: e.g., in certain cases where the action set 𝒜 is a polytope, synthesis over a discrete-action IMDP, where the actions are the vertices of 𝒜, is sufficient for optimality. We demonstrate our results on a numerical example. Finally, we include a short discussion on employing caIMDPs as abstractions for control synthesis.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信