Hoang Nam Nguyen, Abdel Lisser, Vikas Vikram Singh
{"title":"具有随机回报的分布稳健机会约束马尔可夫决策过程","authors":"Hoang Nam Nguyen, Abdel Lisser, Vikas Vikram Singh","doi":"10.1007/s00245-024-10167-w","DOIUrl":null,"url":null,"abstract":"<div><p>A Markov Decision Process (MDP) is a framework used for decision-making. In an MDP problem, the decision maker’s goal is to maximize the expected discounted value of future rewards while navigating through different states controlled by a Markov chain. In this paper, we focus on the case where the transition probabilities vector is deterministic, while the reward vector is uncertain and follow a partially known distribution. We employ a distributionally robust chance constraints approach to model the MDP. This approach entails the construction of potential distributions of reward vector, characterized by moments or statistical metrics. We explore two situations for these ambiguity sets: one where the reward vector has a real support and another where it is constrained to be nonnegative. In the case of a real support, we demonstrate that solving the distributionally robust chance-constrained Markov decision process is mathematically equivalent to a second-order cone programming problem for moments and <span>\\(\\phi \\)</span>-divergence ambiguity sets. For Wasserstein distance ambiguity sets, it becomes a mixed-integer second-order cone programming problem. In contrast, when dealing with nonnegative reward vector, the equivalent optimization problems are different. Moments-based ambiguity sets lead to a copositive optimization problem, while Wasserstein distance-based ambiguity sets result in a biconvex optimization problem. To illustrate the practical application of these methods, we examine a machine replacement problem and present results conducted on randomly generated instances to showcase the effectiveness of our proposed methods.</p></div>","PeriodicalId":55566,"journal":{"name":"Applied Mathematics and Optimization","volume":"90 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Distributionally Robust Chance-Constrained Markov Decision Processes with Random Payoff\",\"authors\":\"Hoang Nam Nguyen, Abdel Lisser, Vikas Vikram Singh\",\"doi\":\"10.1007/s00245-024-10167-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>A Markov Decision Process (MDP) is a framework used for decision-making. In an MDP problem, the decision maker’s goal is to maximize the expected discounted value of future rewards while navigating through different states controlled by a Markov chain. In this paper, we focus on the case where the transition probabilities vector is deterministic, while the reward vector is uncertain and follow a partially known distribution. We employ a distributionally robust chance constraints approach to model the MDP. This approach entails the construction of potential distributions of reward vector, characterized by moments or statistical metrics. We explore two situations for these ambiguity sets: one where the reward vector has a real support and another where it is constrained to be nonnegative. In the case of a real support, we demonstrate that solving the distributionally robust chance-constrained Markov decision process is mathematically equivalent to a second-order cone programming problem for moments and <span>\\\\(\\\\phi \\\\)</span>-divergence ambiguity sets. For Wasserstein distance ambiguity sets, it becomes a mixed-integer second-order cone programming problem. In contrast, when dealing with nonnegative reward vector, the equivalent optimization problems are different. Moments-based ambiguity sets lead to a copositive optimization problem, while Wasserstein distance-based ambiguity sets result in a biconvex optimization problem. To illustrate the practical application of these methods, we examine a machine replacement problem and present results conducted on randomly generated instances to showcase the effectiveness of our proposed methods.</p></div>\",\"PeriodicalId\":55566,\"journal\":{\"name\":\"Applied Mathematics and Optimization\",\"volume\":\"90 1\",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2024-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Mathematics and Optimization\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s00245-024-10167-w\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICS, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Mathematics and Optimization","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1007/s00245-024-10167-w","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
Distributionally Robust Chance-Constrained Markov Decision Processes with Random Payoff
A Markov Decision Process (MDP) is a framework used for decision-making. In an MDP problem, the decision maker’s goal is to maximize the expected discounted value of future rewards while navigating through different states controlled by a Markov chain. In this paper, we focus on the case where the transition probabilities vector is deterministic, while the reward vector is uncertain and follow a partially known distribution. We employ a distributionally robust chance constraints approach to model the MDP. This approach entails the construction of potential distributions of reward vector, characterized by moments or statistical metrics. We explore two situations for these ambiguity sets: one where the reward vector has a real support and another where it is constrained to be nonnegative. In the case of a real support, we demonstrate that solving the distributionally robust chance-constrained Markov decision process is mathematically equivalent to a second-order cone programming problem for moments and \(\phi \)-divergence ambiguity sets. For Wasserstein distance ambiguity sets, it becomes a mixed-integer second-order cone programming problem. In contrast, when dealing with nonnegative reward vector, the equivalent optimization problems are different. Moments-based ambiguity sets lead to a copositive optimization problem, while Wasserstein distance-based ambiguity sets result in a biconvex optimization problem. To illustrate the practical application of these methods, we examine a machine replacement problem and present results conducted on randomly generated instances to showcase the effectiveness of our proposed methods.
期刊介绍:
The Applied Mathematics and Optimization Journal covers a broad range of mathematical methods in particular those that bridge with optimization and have some connection with applications. Core topics include calculus of variations, partial differential equations, stochastic control, optimization of deterministic or stochastic systems in discrete or continuous time, homogenization, control theory, mean field games, dynamic games and optimal transport. Algorithmic, data analytic, machine learning and numerical methods which support the modeling and analysis of optimization problems are encouraged. Of great interest are papers which show some novel idea in either the theory or model which include some connection with potential applications in science and engineering.