{"title":"跳跃扩散的强化学习","authors":"Xuefeng Gao, Lingfei Li, Xun Yu Zhou","doi":"arxiv-2405.16449","DOIUrl":null,"url":null,"abstract":"We study continuous-time reinforcement learning (RL) for stochastic control\nin which system dynamics are governed by jump-diffusion processes. We formulate\nan entropy-regularized exploratory control problem with stochastic policies to\ncapture the exploration--exploitation balance essential for RL. Unlike the pure\ndiffusion case initially studied by Wang et al. (2020), the derivation of the\nexploratory dynamics under jump-diffusions calls for a careful formulation of\nthe jump part. Through a theoretical analysis, we find that one can simply use\nthe same policy evaluation and q-learning algorithms in Jia and Zhou (2022a,\n2023), originally developed for controlled diffusions, without needing to check\na priori whether the underlying data come from a pure diffusion or a\njump-diffusion. However, we show that the presence of jumps ought to affect\nparameterizations of actors and critics in general. Finally, we investigate as\nan application the mean-variance portfolio selection problem with stock price\nmodelled as a jump-diffusion, and show that both RL algorithms and\nparameterizations are invariant with respect to jumps.","PeriodicalId":501084,"journal":{"name":"arXiv - QuantFin - Mathematical Finance","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement Learning for Jump-Diffusions\",\"authors\":\"Xuefeng Gao, Lingfei Li, Xun Yu Zhou\",\"doi\":\"arxiv-2405.16449\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study continuous-time reinforcement learning (RL) for stochastic control\\nin which system dynamics are governed by jump-diffusion processes. We formulate\\nan entropy-regularized exploratory control problem with stochastic policies to\\ncapture the exploration--exploitation balance essential for RL. Unlike the pure\\ndiffusion case initially studied by Wang et al. (2020), the derivation of the\\nexploratory dynamics under jump-diffusions calls for a careful formulation of\\nthe jump part. Through a theoretical analysis, we find that one can simply use\\nthe same policy evaluation and q-learning algorithms in Jia and Zhou (2022a,\\n2023), originally developed for controlled diffusions, without needing to check\\na priori whether the underlying data come from a pure diffusion or a\\njump-diffusion. However, we show that the presence of jumps ought to affect\\nparameterizations of actors and critics in general. Finally, we investigate as\\nan application the mean-variance portfolio selection problem with stock price\\nmodelled as a jump-diffusion, and show that both RL algorithms and\\nparameterizations are invariant with respect to jumps.\",\"PeriodicalId\":501084,\"journal\":{\"name\":\"arXiv - QuantFin - Mathematical Finance\",\"volume\":\"24 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuantFin - Mathematical Finance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2405.16449\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Mathematical Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.16449","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
We study continuous-time reinforcement learning (RL) for stochastic control
in which system dynamics are governed by jump-diffusion processes. We formulate
an entropy-regularized exploratory control problem with stochastic policies to
capture the exploration--exploitation balance essential for RL. Unlike the pure
diffusion case initially studied by Wang et al. (2020), the derivation of the
exploratory dynamics under jump-diffusions calls for a careful formulation of
the jump part. Through a theoretical analysis, we find that one can simply use
the same policy evaluation and q-learning algorithms in Jia and Zhou (2022a,
2023), originally developed for controlled diffusions, without needing to check
a priori whether the underlying data come from a pure diffusion or a
jump-diffusion. However, we show that the presence of jumps ought to affect
parameterizations of actors and critics in general. Finally, we investigate as
an application the mean-variance portfolio selection problem with stock price
modelled as a jump-diffusion, and show that both RL algorithms and
parameterizations are invariant with respect to jumps.