Markov decision processes: Monotonicity of optimal policy in exponential and quasi-hyperbolic discounting parameters

IF 6 2区管理学 Q1 OPERATIONS RESEARCH & MANAGEMENT SCIENCE

European Journal of Operational Research Pub Date : 2025-09-23 DOI:10.1016/j.ejor.2025.09.013

Hakan Kılıç , Pelin Gülşah Canbolat , Evrim Didem Güneş

{"title":"Markov decision processes: Monotonicity of optimal policy in exponential and quasi-hyperbolic discounting parameters","authors":"Hakan Kılıç , Pelin Gülşah Canbolat , Evrim Didem Güneş","doi":"10.1016/j.ejor.2025.09.013","DOIUrl":null,"url":null,"abstract":"<div><div>Intertemporal preferences of decision makers, i.e., the way they discount delayed utilities, impact their decisions. Empirical evidence suggests that individuals commonly have hyperbolic discounting preferences. This can result in time-inconsistent behavior, e.g., procrastination, which may be a barrier to adopting preventive behavior such as machine maintenance and patient adherence to treatment. In this paper, we theoretically compare the actions of individuals based on their discounting characteristics. We consider the Hyperbolic Discounting (HD) model, which is more representative of individual behavior than Exponential Discounting (ED). We formulate a discrete-time finite-horizon Markov decision process with Quasi-Hyperbolic Discounting (QHD), an analytically tractable function representing HD and present sufficient conditions that ensure the monotonicity of the optimal policy in the discounting parameters. We consider submodular maximization or supermodular maximization problems. Our paper is the first to investigate the monotonicity of the optimal policy in QHD parameters for these problems. Moreover, we compare the optimal actions under ED and QHD. We apply our results to the settings of machine maintenance, individual health behavior and inventory control. We provide numerical examples that show there might not be monotonicity if our sufficient conditions are not met. Also, we explore the discrepancy between the expected total exponentially-discounted rewards of the actions obtained from QHD and of the actions that are optimal under ED, and observe that this discrepancy is affected mainly by the present bias.</div></div>","PeriodicalId":55161,"journal":{"name":"European Journal of Operational Research","volume":"328 3","pages":"Pages 877-893"},"PeriodicalIF":6.0000,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Operational Research","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0377221725007301","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Intertemporal preferences of decision makers, i.e., the way they discount delayed utilities, impact their decisions. Empirical evidence suggests that individuals commonly have hyperbolic discounting preferences. This can result in time-inconsistent behavior, e.g., procrastination, which may be a barrier to adopting preventive behavior such as machine maintenance and patient adherence to treatment. In this paper, we theoretically compare the actions of individuals based on their discounting characteristics. We consider the Hyperbolic Discounting (HD) model, which is more representative of individual behavior than Exponential Discounting (ED). We formulate a discrete-time finite-horizon Markov decision process with Quasi-Hyperbolic Discounting (QHD), an analytically tractable function representing HD and present sufficient conditions that ensure the monotonicity of the optimal policy in the discounting parameters. We consider submodular maximization or supermodular maximization problems. Our paper is the first to investigate the monotonicity of the optimal policy in QHD parameters for these problems. Moreover, we compare the optimal actions under ED and QHD. We apply our results to the settings of machine maintenance, individual health behavior and inventory control. We provide numerical examples that show there might not be monotonicity if our sufficient conditions are not met. Also, we explore the discrepancy between the expected total exponentially-discounted rewards of the actions obtained from QHD and of the actions that are optimal under ED, and observe that this discrepancy is affected mainly by the present bias.

查看原文本刊更多论文

马尔可夫决策过程：指数型和拟双曲型贴现参数下最优策略的单调性

决策者的跨期偏好，即他们贴现延迟效用的方式，影响他们的决策。经验证据表明，个体通常具有双曲折现偏好。这可能导致时间不一致的行为，例如拖延症，这可能成为采取预防性行为（如机器维护和患者坚持治疗）的障碍。本文根据个体的折现特征，从理论上比较了个体的行为。我们考虑了双曲折现（HD）模型，它比指数折现（ED）模型更能代表个体行为。我们构造了一个具有拟双曲折现（QHD）的离散有限视界马尔可夫决策过程，并给出了最优策略在折现参数中单调性的充分条件。我们考虑次模最大化或超模最大化问题。本文首次研究了这些问题的QHD参数中最优策略的单调性。此外，我们还比较了ED和QHD下的最优行为。我们将我们的结果应用于机器维护，个人健康行为和库存控制的设置。我们给出的数值例子表明，当我们的充分条件不满足时，可能不存在单调性。此外，我们还探讨了从QHD中获得的行动的预期总指数折现奖励与ED下最优行动的预期总指数折现奖励之间的差异，并观察到这种差异主要受当前偏差的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Journal of Operational Research 管理科学-运筹学与管理科学

CiteScore

11.90

自引率

9.40%

发文量

786

审稿时长

8.2 months

期刊介绍： The European Journal of Operational Research (EJOR) publishes high quality, original papers that contribute to the methodology of operational research (OR) and to the practice of decision making.