{"title":"Rationally inattentive Markov decision processes over a finite horizon","authors":"Ehsan Shafieepoorfard, M. Raginsky","doi":"10.1109/ACSSC.2017.8335416","DOIUrl":null,"url":null,"abstract":"The framework of Rationally Inattentive Markov Decision Processes (RIMDPs) is an extension of Partially Observable Markov Decision Processes (POMDP) to the case when the observation kernel that governs the information gathering process is also selected by the decision maker. At each time, an observation kernel is chosen subject to a constraint on the Shannon conditional mutual information between the history of states and the current observation given the history of past observations. This set-up naturally arises in the context of networked control systems, artificial intelligence, and economic decision-making by boundedly rational agents. We show that, under certain structural assumptions on the information pattern and on the optimal policy, Bellman's Principle of Optimality can be used to derive a general dynamic programming recursion for this problem that reduces to solving a sequence of conditional rate-distortion problems.","PeriodicalId":296208,"journal":{"name":"2017 51st Asilomar Conference on Signals, Systems, and Computers","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 51st Asilomar Conference on Signals, Systems, and Computers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACSSC.2017.8335416","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The framework of Rationally Inattentive Markov Decision Processes (RIMDPs) is an extension of Partially Observable Markov Decision Processes (POMDP) to the case when the observation kernel that governs the information gathering process is also selected by the decision maker. At each time, an observation kernel is chosen subject to a constraint on the Shannon conditional mutual information between the history of states and the current observation given the history of past observations. This set-up naturally arises in the context of networked control systems, artificial intelligence, and economic decision-making by boundedly rational agents. We show that, under certain structural assumptions on the information pattern and on the optimal policy, Bellman's Principle of Optimality can be used to derive a general dynamic programming recursion for this problem that reduces to solving a sequence of conditional rate-distortion problems.