{"title":"风险概率标准下的有限视界部分可观测半马尔可夫决策过程","authors":"Xin Wen , Xianping Guo , Li Xia","doi":"10.1016/j.orl.2024.107187","DOIUrl":null,"url":null,"abstract":"<div><div>This paper deals with a risk probability minimization problem for finite horizon partially observable semi-Markov decision processes, which are the fairly most general models for stochastic dynamic systems. In contrast to the expected discounted and average criteria, the optimality investigated in this paper is to minimize the probability that the accumulated rewards do not reach a prescribed profit level at the finite terminal stage. First, the state space is augmented as the joint conditional distribution of the current unobserved state and the remaining profit goal. We introduce a class of policies depending on observable histories and a class of Markov policies including observable process with the joint conditional distribution. Then under mild assumptions, we prove that the value function is the unique solution to the optimality equation for the probability criterion by using iteration techniques. The existence of (<em>ϵ</em>-)optimal Markov policy for this problem is established. Finally, we use a bandit problem with the probability criterion to demonstrate our main results in which an effective algorithm and the corresponding numerical calculation are given for the semi-Markov model. Moreover, for the case of reduction to the discrete-time Markov model, we derive a concise solution.</div></div>","PeriodicalId":54682,"journal":{"name":"Operations Research Letters","volume":"57 ","pages":"Article 107187"},"PeriodicalIF":0.8000,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Finite horizon partially observable semi-Markov decision processes under risk probability criteria\",\"authors\":\"Xin Wen , Xianping Guo , Li Xia\",\"doi\":\"10.1016/j.orl.2024.107187\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper deals with a risk probability minimization problem for finite horizon partially observable semi-Markov decision processes, which are the fairly most general models for stochastic dynamic systems. In contrast to the expected discounted and average criteria, the optimality investigated in this paper is to minimize the probability that the accumulated rewards do not reach a prescribed profit level at the finite terminal stage. First, the state space is augmented as the joint conditional distribution of the current unobserved state and the remaining profit goal. We introduce a class of policies depending on observable histories and a class of Markov policies including observable process with the joint conditional distribution. Then under mild assumptions, we prove that the value function is the unique solution to the optimality equation for the probability criterion by using iteration techniques. The existence of (<em>ϵ</em>-)optimal Markov policy for this problem is established. Finally, we use a bandit problem with the probability criterion to demonstrate our main results in which an effective algorithm and the corresponding numerical calculation are given for the semi-Markov model. Moreover, for the case of reduction to the discrete-time Markov model, we derive a concise solution.</div></div>\",\"PeriodicalId\":54682,\"journal\":{\"name\":\"Operations Research Letters\",\"volume\":\"57 \",\"pages\":\"Article 107187\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2024-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Operations Research Letters\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167637724001238\",\"RegionNum\":4,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"OPERATIONS RESEARCH & MANAGEMENT SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Operations Research Letters","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167637724001238","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}
Finite horizon partially observable semi-Markov decision processes under risk probability criteria
This paper deals with a risk probability minimization problem for finite horizon partially observable semi-Markov decision processes, which are the fairly most general models for stochastic dynamic systems. In contrast to the expected discounted and average criteria, the optimality investigated in this paper is to minimize the probability that the accumulated rewards do not reach a prescribed profit level at the finite terminal stage. First, the state space is augmented as the joint conditional distribution of the current unobserved state and the remaining profit goal. We introduce a class of policies depending on observable histories and a class of Markov policies including observable process with the joint conditional distribution. Then under mild assumptions, we prove that the value function is the unique solution to the optimality equation for the probability criterion by using iteration techniques. The existence of (ϵ-)optimal Markov policy for this problem is established. Finally, we use a bandit problem with the probability criterion to demonstrate our main results in which an effective algorithm and the corresponding numerical calculation are given for the semi-Markov model. Moreover, for the case of reduction to the discrete-time Markov model, we derive a concise solution.
期刊介绍:
Operations Research Letters is committed to the rapid review and fast publication of short articles on all aspects of operations research and analytics. Apart from a limitation to eight journal pages, quality, originality, relevance and clarity are the only criteria for selecting the papers to be published. ORL covers the broad field of optimization, stochastic models and game theory. Specific areas of interest include networks, routing, location, queueing, scheduling, inventory, reliability, and financial engineering. We wish to explore interfaces with other fields such as life sciences and health care, artificial intelligence and machine learning, energy distribution, and computational social sciences and humanities. Our traditional strength is in methodology, including theory, modelling, algorithms and computational studies. We also welcome novel applications and concise literature reviews.