Hakan Kılıç , Pelin Gülşah Canbolat , Evrim Didem Güneş
{"title":"Markov decision processes: Monotonicity of optimal policy in exponential and quasi-hyperbolic discounting parameters","authors":"Hakan Kılıç , Pelin Gülşah Canbolat , Evrim Didem Güneş","doi":"10.1016/j.ejor.2025.09.013","DOIUrl":null,"url":null,"abstract":"<div><div>Intertemporal preferences of decision makers, i.e., the way they discount delayed utilities, impact their decisions. Empirical evidence suggests that individuals commonly have hyperbolic discounting preferences. This can result in time-inconsistent behavior, e.g., procrastination, which may be a barrier to adopting preventive behavior such as machine maintenance and patient adherence to treatment. In this paper, we theoretically compare the actions of individuals based on their discounting characteristics. We consider the Hyperbolic Discounting (HD) model, which is more representative of individual behavior than Exponential Discounting (ED). We formulate a discrete-time finite-horizon Markov decision process with Quasi-Hyperbolic Discounting (QHD), an analytically tractable function representing HD and present sufficient conditions that ensure the monotonicity of the optimal policy in the discounting parameters. We consider submodular maximization or supermodular maximization problems. Our paper is the first to investigate the monotonicity of the optimal policy in QHD parameters for these problems. Moreover, we compare the optimal actions under ED and QHD. We apply our results to the settings of machine maintenance, individual health behavior and inventory control. We provide numerical examples that show there might not be monotonicity if our sufficient conditions are not met. Also, we explore the discrepancy between the expected total exponentially-discounted rewards of the actions obtained from QHD and of the actions that are optimal under ED, and observe that this discrepancy is affected mainly by the present bias.</div></div>","PeriodicalId":55161,"journal":{"name":"European Journal of Operational Research","volume":"328 3","pages":"Pages 877-893"},"PeriodicalIF":6.0000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Operational Research","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0377221725007301","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Intertemporal preferences of decision makers, i.e., the way they discount delayed utilities, impact their decisions. Empirical evidence suggests that individuals commonly have hyperbolic discounting preferences. This can result in time-inconsistent behavior, e.g., procrastination, which may be a barrier to adopting preventive behavior such as machine maintenance and patient adherence to treatment. In this paper, we theoretically compare the actions of individuals based on their discounting characteristics. We consider the Hyperbolic Discounting (HD) model, which is more representative of individual behavior than Exponential Discounting (ED). We formulate a discrete-time finite-horizon Markov decision process with Quasi-Hyperbolic Discounting (QHD), an analytically tractable function representing HD and present sufficient conditions that ensure the monotonicity of the optimal policy in the discounting parameters. We consider submodular maximization or supermodular maximization problems. Our paper is the first to investigate the monotonicity of the optimal policy in QHD parameters for these problems. Moreover, we compare the optimal actions under ED and QHD. We apply our results to the settings of machine maintenance, individual health behavior and inventory control. We provide numerical examples that show there might not be monotonicity if our sufficient conditions are not met. Also, we explore the discrepancy between the expected total exponentially-discounted rewards of the actions obtained from QHD and of the actions that are optimal under ED, and observe that this discrepancy is affected mainly by the present bias.
期刊介绍:
The European Journal of Operational Research (EJOR) publishes high quality, original papers that contribute to the methodology of operational research (OR) and to the practice of decision making.