跳跃扩散的强化学习

Xuefeng Gao, Lingfei Li, Xun Yu Zhou
{"title":"跳跃扩散的强化学习","authors":"Xuefeng Gao, Lingfei Li, Xun Yu Zhou","doi":"arxiv-2405.16449","DOIUrl":null,"url":null,"abstract":"We study continuous-time reinforcement learning (RL) for stochastic control\nin which system dynamics are governed by jump-diffusion processes. We formulate\nan entropy-regularized exploratory control problem with stochastic policies to\ncapture the exploration--exploitation balance essential for RL. Unlike the pure\ndiffusion case initially studied by Wang et al. (2020), the derivation of the\nexploratory dynamics under jump-diffusions calls for a careful formulation of\nthe jump part. Through a theoretical analysis, we find that one can simply use\nthe same policy evaluation and q-learning algorithms in Jia and Zhou (2022a,\n2023), originally developed for controlled diffusions, without needing to check\na priori whether the underlying data come from a pure diffusion or a\njump-diffusion. However, we show that the presence of jumps ought to affect\nparameterizations of actors and critics in general. Finally, we investigate as\nan application the mean-variance portfolio selection problem with stock price\nmodelled as a jump-diffusion, and show that both RL algorithms and\nparameterizations are invariant with respect to jumps.","PeriodicalId":501084,"journal":{"name":"arXiv - QuantFin - Mathematical Finance","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement Learning for Jump-Diffusions\",\"authors\":\"Xuefeng Gao, Lingfei Li, Xun Yu Zhou\",\"doi\":\"arxiv-2405.16449\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study continuous-time reinforcement learning (RL) for stochastic control\\nin which system dynamics are governed by jump-diffusion processes. We formulate\\nan entropy-regularized exploratory control problem with stochastic policies to\\ncapture the exploration--exploitation balance essential for RL. Unlike the pure\\ndiffusion case initially studied by Wang et al. (2020), the derivation of the\\nexploratory dynamics under jump-diffusions calls for a careful formulation of\\nthe jump part. Through a theoretical analysis, we find that one can simply use\\nthe same policy evaluation and q-learning algorithms in Jia and Zhou (2022a,\\n2023), originally developed for controlled diffusions, without needing to check\\na priori whether the underlying data come from a pure diffusion or a\\njump-diffusion. However, we show that the presence of jumps ought to affect\\nparameterizations of actors and critics in general. Finally, we investigate as\\nan application the mean-variance portfolio selection problem with stock price\\nmodelled as a jump-diffusion, and show that both RL algorithms and\\nparameterizations are invariant with respect to jumps.\",\"PeriodicalId\":501084,\"journal\":{\"name\":\"arXiv - QuantFin - Mathematical Finance\",\"volume\":\"24 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuantFin - Mathematical Finance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2405.16449\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Mathematical Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.16449","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

我们研究了系统动态受跳跃-扩散过程支配的随机控制连续时间强化学习(RL)。我们提出了一个具有随机策略的熵正则化探索控制问题,以实现 RL 所必需的探索-开发平衡。与 Wang 等人(2020)最初研究的 purediffusion 情况不同,在跳跃-扩散情况下探索动力学的推导需要对跳跃部分进行细致的表述。通过理论分析,我们发现可以简单地使用 Jia 和 Zhou(2022a,2023)中最初为受控扩散而开发的相同的策略评估和 q-learning 算法,而无需先验地检查基础数据是来自纯扩散还是跳跃扩散。然而,我们发现跳跃的存在一般会影响行为者和批评者的参数设置。最后,我们将股票价格模拟为跳跃扩散的均值方差投资组合选择问题作为一个应用进行了研究,结果表明 RL 算法和参数化对于跳跃都是不变的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Reinforcement Learning for Jump-Diffusions
We study continuous-time reinforcement learning (RL) for stochastic control in which system dynamics are governed by jump-diffusion processes. We formulate an entropy-regularized exploratory control problem with stochastic policies to capture the exploration--exploitation balance essential for RL. Unlike the pure diffusion case initially studied by Wang et al. (2020), the derivation of the exploratory dynamics under jump-diffusions calls for a careful formulation of the jump part. Through a theoretical analysis, we find that one can simply use the same policy evaluation and q-learning algorithms in Jia and Zhou (2022a, 2023), originally developed for controlled diffusions, without needing to check a priori whether the underlying data come from a pure diffusion or a jump-diffusion. However, we show that the presence of jumps ought to affect parameterizations of actors and critics in general. Finally, we investigate as an application the mean-variance portfolio selection problem with stock price modelled as a jump-diffusion, and show that both RL algorithms and parameterizations are invariant with respect to jumps.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信