Interval Dominance based Structural Results for Markov Decision Process

V. Krishnamurthy
{"title":"Interval Dominance based Structural Results for Markov Decision Process","authors":"V. Krishnamurthy","doi":"10.48550/arXiv.2203.10618","DOIUrl":null,"url":null,"abstract":"Structural results impose sufficient conditions on the model parameters of a Markov decision process (MDP) so that the optimal policy is an increasing function of the underlying state. The classical assumptions for MDP structural results require supermodularity of the rewards and transition probabilities. However, supermodularity does not hold in many applications. This paper uses a sufficient condition for interval dominance (called I) proposed in the microeconomics literature, to obtain structural results for MDPs under more general conditions. We present several MDP examples where supermodularity does not hold, yet I holds, and so the optimal policy is monotone; these include sigmoidal rewards (arising in prospect theory for human decision making), bi-diagonal and perturbed bi-diagonal transition matrices (in optimal allocation problems). We also consider MDPs with TP3 transition matrices and concave value functions. Finally, reinforcement learning algorithms that exploit the differential sparse structure of the optimal monotone policy are discussed.","PeriodicalId":13196,"journal":{"name":"IEEE Robotics Autom. Mag.","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics Autom. Mag.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2203.10618","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Structural results impose sufficient conditions on the model parameters of a Markov decision process (MDP) so that the optimal policy is an increasing function of the underlying state. The classical assumptions for MDP structural results require supermodularity of the rewards and transition probabilities. However, supermodularity does not hold in many applications. This paper uses a sufficient condition for interval dominance (called I) proposed in the microeconomics literature, to obtain structural results for MDPs under more general conditions. We present several MDP examples where supermodularity does not hold, yet I holds, and so the optimal policy is monotone; these include sigmoidal rewards (arising in prospect theory for human decision making), bi-diagonal and perturbed bi-diagonal transition matrices (in optimal allocation problems). We also consider MDPs with TP3 transition matrices and concave value functions. Finally, reinforcement learning algorithms that exploit the differential sparse structure of the optimal monotone policy are discussed.
基于区间优势的马尔可夫决策过程结构结果
结构结果对马尔可夫决策过程(MDP)的模型参数施加了充分条件,使得最优策略是底层状态的递增函数。MDP结构结果的经典假设要求奖励和转移概率的超模块化。然而,超模块化在许多应用中并不适用。本文利用微观经济学文献中提出的区间优势的充分条件(称为I),得到了更一般条件下gdp的结构结果。我们给出了几个MDP例子,其中超模块化不成立,但I成立,因此最优策略是单调的;这些包括s型奖励(在人类决策的前景理论中出现),双对角和摄动双对角转移矩阵(在最优分配问题中)。我们还考虑了具有TP3转移矩阵和凹值函数的mdp。最后,讨论了利用最优单调策略的差分稀疏结构的强化学习算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信