{"title":"DISCRETE DYNAMIC PROGRAMMING WITH RECURSIVE ADDITIVE SYSTEM","authors":"Seiichi Iwamoto","doi":"10.5109/13082","DOIUrl":null,"url":null,"abstract":"In the paper [5], N. Furukawa and S. Iwamoto have defined Markovian decision processes with a new broad class of reward systems, that is, recursive reward functions, and have studied the existence and properties of optimal policies. Under some conditions on the reward functions, they have proved that there exists a (p, s)-optimal stationary policy and that in the case of a finite action space there exists an optimal stationary policy. These are some generalizations of results by D. Blackwell [3]. In this paper the author defines a dynamic programming problem with a recursive additive system which is referred to one type of Markovian decision processes with recursive reward functions defined by the previous authors [5]. This paper gives an algorithm for finding optimal stationary policies in the dynamic programming with the recursive additive system in the case of finite state and action spaces. Furthermore, we give several interesting examples with numerical computations to obtain optimal policies. The motivation to consider the dynamic programming problem with the recursive additive system is the following : If we restrict the \" reward \" in narrow sense, for instance, the money in economic systems or the loss in statistical decision problems, it will be appropriate for us to accept the total sum of stage-wise rewards as a performance index. That is so-called additive reward system. But many practical problems in the field of engineerings enable us to interpret the \" reward \" in wider sense. In those problems we often encounter much complicated reward systems that are more than so-called additive. We have an interesting class of such complicated reward systems in which we can find a common feature named \" recursive additive \". By talking about various reward systems belonging to this class at the same time, we can make clear, as a dynamic programming problem, an important common property within the class, Our proofs are partially owing to Blackwell [2].","PeriodicalId":287765,"journal":{"name":"Bulletin of Mathematical Statistics","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1974-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bulletin of Mathematical Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5109/13082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
In the paper [5], N. Furukawa and S. Iwamoto have defined Markovian decision processes with a new broad class of reward systems, that is, recursive reward functions, and have studied the existence and properties of optimal policies. Under some conditions on the reward functions, they have proved that there exists a (p, s)-optimal stationary policy and that in the case of a finite action space there exists an optimal stationary policy. These are some generalizations of results by D. Blackwell [3]. In this paper the author defines a dynamic programming problem with a recursive additive system which is referred to one type of Markovian decision processes with recursive reward functions defined by the previous authors [5]. This paper gives an algorithm for finding optimal stationary policies in the dynamic programming with the recursive additive system in the case of finite state and action spaces. Furthermore, we give several interesting examples with numerical computations to obtain optimal policies. The motivation to consider the dynamic programming problem with the recursive additive system is the following : If we restrict the " reward " in narrow sense, for instance, the money in economic systems or the loss in statistical decision problems, it will be appropriate for us to accept the total sum of stage-wise rewards as a performance index. That is so-called additive reward system. But many practical problems in the field of engineerings enable us to interpret the " reward " in wider sense. In those problems we often encounter much complicated reward systems that are more than so-called additive. We have an interesting class of such complicated reward systems in which we can find a common feature named " recursive additive ". By talking about various reward systems belonging to this class at the same time, we can make clear, as a dynamic programming problem, an important common property within the class, Our proofs are partially owing to Blackwell [2].