{"title":"A counterexample and a corrective to the vector extension of the Bellman equations of a Markov decision process","authors":"Anas Mifrani","doi":"10.1007/s10479-024-06439-x","DOIUrl":null,"url":null,"abstract":"<div><p>Under the expected total reward criterion, the optimal value of a finite-horizon Markov decision process can be determined by solving the Bellman equations. The equations were extended by White to processes with vector rewards. Using a counterexample, we show that the assumptions underlying this extension fail to guarantee its validity. Analysis of the counterexample enables us to articulate a sufficient condition for White’s functional equations to be valid. The condition is shown to be true when the policy space has been refined to include a special class of non-Markovian policies, when the dynamics of the model are deterministic, and when the decision making horizon does not exceed two time steps. The paper demonstrates that in general, the solutions to White’s equations are sets of Pareto efficient policy returns over the refined policy space. Our results are illustrated with an example.</p></div>","PeriodicalId":8215,"journal":{"name":"Annals of Operations Research","volume":"345 1","pages":"351 - 369"},"PeriodicalIF":4.4000,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10479-024-06439-x.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Operations Research","FirstCategoryId":"91","ListUrlMain":"https://link.springer.com/article/10.1007/s10479-024-06439-x","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Under the expected total reward criterion, the optimal value of a finite-horizon Markov decision process can be determined by solving the Bellman equations. The equations were extended by White to processes with vector rewards. Using a counterexample, we show that the assumptions underlying this extension fail to guarantee its validity. Analysis of the counterexample enables us to articulate a sufficient condition for White’s functional equations to be valid. The condition is shown to be true when the policy space has been refined to include a special class of non-Markovian policies, when the dynamics of the model are deterministic, and when the decision making horizon does not exceed two time steps. The paper demonstrates that in general, the solutions to White’s equations are sets of Pareto efficient policy returns over the refined policy space. Our results are illustrated with an example.
期刊介绍:
The Annals of Operations Research publishes peer-reviewed original articles dealing with key aspects of operations research, including theory, practice, and computation. The journal publishes full-length research articles, short notes, expositions and surveys, reports on computational studies, and case studies that present new and innovative practical applications.
In addition to regular issues, the journal publishes periodic special volumes that focus on defined fields of operations research, ranging from the highly theoretical to the algorithmic and the applied. These volumes have one or more Guest Editors who are responsible for collecting the papers and overseeing the refereeing process.