{"title":"在不完全市场中学习默顿策略:递归熵正则化与有偏高斯探索","authors":"Min Dai, Yuchao Dong, Yanwei Jia, Xun Yu Zhou","doi":"arxiv-2312.11797","DOIUrl":null,"url":null,"abstract":"We study Merton's expected utility maximization problem in an incomplete\nmarket, characterized by a factor process in addition to the stock price\nprocess, where all the model primitives are unknown. We take the reinforcement\nlearning (RL) approach to learn optimal portfolio policies directly by\nexploring the unknown market, without attempting to estimate the model\nparameters. Based on the entropy-regularization framework for general\ncontinuous-time RL formulated in Wang et al. (2020), we propose a recursive\nweighting scheme on exploration that endogenously discounts the current\nexploration reward by the past accumulative amount of exploration. Such a\nrecursive regularization restores the optimality of Gaussian exploration.\nHowever, contrary to the existing results, the optimal Gaussian policy turns\nout to be biased in general, due to the interwinding needs for hedging and for\nexploration. We present an asymptotic analysis of the resulting errors to show\nhow the level of exploration affects the learned policies. Furthermore, we\nestablish a policy improvement theorem and design several RL algorithms to\nlearn Merton's optimal strategies. At last, we carry out both simulation and\nempirical studies with a stochastic volatility environment to demonstrate the\nefficiency and robustness of the RL algorithms in comparison to the\nconventional plug-in method.","PeriodicalId":501045,"journal":{"name":"arXiv - QuantFin - Portfolio Management","volume":"80 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning Merton's Strategies in an Incomplete Market: Recursive Entropy Regularization and Biased Gaussian Exploration\",\"authors\":\"Min Dai, Yuchao Dong, Yanwei Jia, Xun Yu Zhou\",\"doi\":\"arxiv-2312.11797\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study Merton's expected utility maximization problem in an incomplete\\nmarket, characterized by a factor process in addition to the stock price\\nprocess, where all the model primitives are unknown. We take the reinforcement\\nlearning (RL) approach to learn optimal portfolio policies directly by\\nexploring the unknown market, without attempting to estimate the model\\nparameters. Based on the entropy-regularization framework for general\\ncontinuous-time RL formulated in Wang et al. (2020), we propose a recursive\\nweighting scheme on exploration that endogenously discounts the current\\nexploration reward by the past accumulative amount of exploration. Such a\\nrecursive regularization restores the optimality of Gaussian exploration.\\nHowever, contrary to the existing results, the optimal Gaussian policy turns\\nout to be biased in general, due to the interwinding needs for hedging and for\\nexploration. We present an asymptotic analysis of the resulting errors to show\\nhow the level of exploration affects the learned policies. Furthermore, we\\nestablish a policy improvement theorem and design several RL algorithms to\\nlearn Merton's optimal strategies. At last, we carry out both simulation and\\nempirical studies with a stochastic volatility environment to demonstrate the\\nefficiency and robustness of the RL algorithms in comparison to the\\nconventional plug-in method.\",\"PeriodicalId\":501045,\"journal\":{\"name\":\"arXiv - QuantFin - Portfolio Management\",\"volume\":\"80 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuantFin - Portfolio Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2312.11797\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Portfolio Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2312.11797","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Learning Merton's Strategies in an Incomplete Market: Recursive Entropy Regularization and Biased Gaussian Exploration
We study Merton's expected utility maximization problem in an incomplete
market, characterized by a factor process in addition to the stock price
process, where all the model primitives are unknown. We take the reinforcement
learning (RL) approach to learn optimal portfolio policies directly by
exploring the unknown market, without attempting to estimate the model
parameters. Based on the entropy-regularization framework for general
continuous-time RL formulated in Wang et al. (2020), we propose a recursive
weighting scheme on exploration that endogenously discounts the current
exploration reward by the past accumulative amount of exploration. Such a
recursive regularization restores the optimality of Gaussian exploration.
However, contrary to the existing results, the optimal Gaussian policy turns
out to be biased in general, due to the interwinding needs for hedging and for
exploration. We present an asymptotic analysis of the resulting errors to show
how the level of exploration affects the learned policies. Furthermore, we
establish a policy improvement theorem and design several RL algorithms to
learn Merton's optimal strategies. At last, we carry out both simulation and
empirical studies with a stochastic volatility environment to demonstrate the
efficiency and robustness of the RL algorithms in comparison to the
conventional plug-in method.