Hakan Kılıç , Pelin Gülşah Canbolat , Evrim Didem Güneş
{"title":"马尔可夫决策过程:指数型和拟双曲型贴现参数下最优策略的单调性","authors":"Hakan Kılıç , Pelin Gülşah Canbolat , Evrim Didem Güneş","doi":"10.1016/j.ejor.2025.09.013","DOIUrl":null,"url":null,"abstract":"<div><div>Intertemporal preferences of decision makers, i.e., the way they discount delayed utilities, impact their decisions. Empirical evidence suggests that individuals commonly have hyperbolic discounting preferences. This can result in time-inconsistent behavior, e.g., procrastination, which may be a barrier to adopting preventive behavior such as machine maintenance and patient adherence to treatment. In this paper, we theoretically compare the actions of individuals based on their discounting characteristics. We consider the Hyperbolic Discounting (HD) model, which is more representative of individual behavior than Exponential Discounting (ED). We formulate a discrete-time finite-horizon Markov decision process with Quasi-Hyperbolic Discounting (QHD), an analytically tractable function representing HD and present sufficient conditions that ensure the monotonicity of the optimal policy in the discounting parameters. We consider submodular maximization or supermodular maximization problems. Our paper is the first to investigate the monotonicity of the optimal policy in QHD parameters for these problems. Moreover, we compare the optimal actions under ED and QHD. We apply our results to the settings of machine maintenance, individual health behavior and inventory control. We provide numerical examples that show there might not be monotonicity if our sufficient conditions are not met. Also, we explore the discrepancy between the expected total exponentially-discounted rewards of the actions obtained from QHD and of the actions that are optimal under ED, and observe that this discrepancy is affected mainly by the present bias.</div></div>","PeriodicalId":55161,"journal":{"name":"European Journal of Operational Research","volume":"328 3","pages":"Pages 877-893"},"PeriodicalIF":6.0000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Markov decision processes: Monotonicity of optimal policy in exponential and quasi-hyperbolic discounting parameters\",\"authors\":\"Hakan Kılıç , Pelin Gülşah Canbolat , Evrim Didem Güneş\",\"doi\":\"10.1016/j.ejor.2025.09.013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Intertemporal preferences of decision makers, i.e., the way they discount delayed utilities, impact their decisions. Empirical evidence suggests that individuals commonly have hyperbolic discounting preferences. This can result in time-inconsistent behavior, e.g., procrastination, which may be a barrier to adopting preventive behavior such as machine maintenance and patient adherence to treatment. In this paper, we theoretically compare the actions of individuals based on their discounting characteristics. We consider the Hyperbolic Discounting (HD) model, which is more representative of individual behavior than Exponential Discounting (ED). We formulate a discrete-time finite-horizon Markov decision process with Quasi-Hyperbolic Discounting (QHD), an analytically tractable function representing HD and present sufficient conditions that ensure the monotonicity of the optimal policy in the discounting parameters. We consider submodular maximization or supermodular maximization problems. Our paper is the first to investigate the monotonicity of the optimal policy in QHD parameters for these problems. Moreover, we compare the optimal actions under ED and QHD. We apply our results to the settings of machine maintenance, individual health behavior and inventory control. We provide numerical examples that show there might not be monotonicity if our sufficient conditions are not met. Also, we explore the discrepancy between the expected total exponentially-discounted rewards of the actions obtained from QHD and of the actions that are optimal under ED, and observe that this discrepancy is affected mainly by the present bias.</div></div>\",\"PeriodicalId\":55161,\"journal\":{\"name\":\"European Journal of Operational Research\",\"volume\":\"328 3\",\"pages\":\"Pages 877-893\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2025-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Journal of Operational Research\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0377221725007301\",\"RegionNum\":2,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"OPERATIONS RESEARCH & MANAGEMENT SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Operational Research","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0377221725007301","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}
Markov decision processes: Monotonicity of optimal policy in exponential and quasi-hyperbolic discounting parameters
Intertemporal preferences of decision makers, i.e., the way they discount delayed utilities, impact their decisions. Empirical evidence suggests that individuals commonly have hyperbolic discounting preferences. This can result in time-inconsistent behavior, e.g., procrastination, which may be a barrier to adopting preventive behavior such as machine maintenance and patient adherence to treatment. In this paper, we theoretically compare the actions of individuals based on their discounting characteristics. We consider the Hyperbolic Discounting (HD) model, which is more representative of individual behavior than Exponential Discounting (ED). We formulate a discrete-time finite-horizon Markov decision process with Quasi-Hyperbolic Discounting (QHD), an analytically tractable function representing HD and present sufficient conditions that ensure the monotonicity of the optimal policy in the discounting parameters. We consider submodular maximization or supermodular maximization problems. Our paper is the first to investigate the monotonicity of the optimal policy in QHD parameters for these problems. Moreover, we compare the optimal actions under ED and QHD. We apply our results to the settings of machine maintenance, individual health behavior and inventory control. We provide numerical examples that show there might not be monotonicity if our sufficient conditions are not met. Also, we explore the discrepancy between the expected total exponentially-discounted rewards of the actions obtained from QHD and of the actions that are optimal under ED, and observe that this discrepancy is affected mainly by the present bias.
期刊介绍:
The European Journal of Operational Research (EJOR) publishes high quality, original papers that contribute to the methodology of operational research (OR) and to the practice of decision making.