Inference of Utilities and Time Preference in Sequential Decision-Making

Haoyang Cao, Zhengqi Wu, Renyuan Xu
{"title":"Inference of Utilities and Time Preference in Sequential Decision-Making","authors":"Haoyang Cao, Zhengqi Wu, Renyuan Xu","doi":"arxiv-2405.15975","DOIUrl":null,"url":null,"abstract":"This paper introduces a novel stochastic control framework to enhance the\ncapabilities of automated investment managers, or robo-advisors, by accurately\ninferring clients' investment preferences from past activities. Our approach\nleverages a continuous-time model that incorporates utility functions and a\ngeneric discounting scheme of a time-varying rate, tailored to each client's\nrisk tolerance, valuation of daily consumption, and significant life goals. We\naddress the resulting time inconsistency issue through state augmentation and\nthe establishment of the dynamic programming principle and the verification\ntheorem. Additionally, we provide sufficient conditions for the identifiability\nof client investment preferences. To complement our theoretical developments,\nwe propose a learning algorithm based on maximum likelihood estimation within a\ndiscrete-time Markov Decision Process framework, augmented with entropy\nregularization. We prove that the log-likelihood function is locally concave,\nfacilitating the fast convergence of our proposed algorithm. Practical\neffectiveness and efficiency are showcased through two numerical examples,\nincluding Merton's problem and an investment problem with unhedgeable risks. Our proposed framework not only advances financial technology by improving\npersonalized investment advice but also contributes broadly to other fields\nsuch as healthcare, economics, and artificial intelligence, where understanding\nindividual preferences is crucial.","PeriodicalId":501294,"journal":{"name":"arXiv - QuantFin - Computational Finance","volume":"22 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Computational Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.15975","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper introduces a novel stochastic control framework to enhance the capabilities of automated investment managers, or robo-advisors, by accurately inferring clients' investment preferences from past activities. Our approach leverages a continuous-time model that incorporates utility functions and a generic discounting scheme of a time-varying rate, tailored to each client's risk tolerance, valuation of daily consumption, and significant life goals. We address the resulting time inconsistency issue through state augmentation and the establishment of the dynamic programming principle and the verification theorem. Additionally, we provide sufficient conditions for the identifiability of client investment preferences. To complement our theoretical developments, we propose a learning algorithm based on maximum likelihood estimation within a discrete-time Markov Decision Process framework, augmented with entropy regularization. We prove that the log-likelihood function is locally concave, facilitating the fast convergence of our proposed algorithm. Practical effectiveness and efficiency are showcased through two numerical examples, including Merton's problem and an investment problem with unhedgeable risks. Our proposed framework not only advances financial technology by improving personalized investment advice but also contributes broadly to other fields such as healthcare, economics, and artificial intelligence, where understanding individual preferences is crucial.
顺序决策中的效用和时间偏好推断
本文介绍了一种新颖的随机控制框架,通过从过去的活动中准确推断客户的投资偏好来增强自动投资经理或机器人顾问的能力。我们的方法利用了一个连续时间模型,该模型结合了效用函数和时间变化率的通用贴现方案,根据每位客户的风险承受能力、日常消费估值和重要人生目标量身定制。我们通过状态增强以及动态编程原理和验证定理的建立,解决了由此产生的时间不一致性问题。此外,我们还为客户投资偏好的可识别性提供了充分条件。为了补充我们的理论发展,我们在离散时间马尔可夫决策过程框架内提出了一种基于最大似然估计的学习算法,并对其进行了熵正则化处理。我们证明了对数似然函数是局部凹陷的,这有助于我们提出的算法快速收敛。通过两个数值示例,包括默顿问题和不可对冲风险的投资问题,展示了算法的实用性和效率。我们提出的框架不仅通过改进个性化投资建议推动了金融技术的发展,而且还为医疗保健、经济学和人工智能等其他领域做出了广泛贡献,在这些领域,理解个人偏好至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信