{"title":"Reinforcement Learning for Jump-Diffusions","authors":"Xuefeng Gao, Lingfei Li, Xun Yu Zhou","doi":"arxiv-2405.16449","DOIUrl":null,"url":null,"abstract":"We study continuous-time reinforcement learning (RL) for stochastic control\nin which system dynamics are governed by jump-diffusion processes. We formulate\nan entropy-regularized exploratory control problem with stochastic policies to\ncapture the exploration--exploitation balance essential for RL. Unlike the pure\ndiffusion case initially studied by Wang et al. (2020), the derivation of the\nexploratory dynamics under jump-diffusions calls for a careful formulation of\nthe jump part. Through a theoretical analysis, we find that one can simply use\nthe same policy evaluation and q-learning algorithms in Jia and Zhou (2022a,\n2023), originally developed for controlled diffusions, without needing to check\na priori whether the underlying data come from a pure diffusion or a\njump-diffusion. However, we show that the presence of jumps ought to affect\nparameterizations of actors and critics in general. Finally, we investigate as\nan application the mean-variance portfolio selection problem with stock price\nmodelled as a jump-diffusion, and show that both RL algorithms and\nparameterizations are invariant with respect to jumps.","PeriodicalId":501084,"journal":{"name":"arXiv - QuantFin - Mathematical Finance","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Mathematical Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.16449","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We study continuous-time reinforcement learning (RL) for stochastic control
in which system dynamics are governed by jump-diffusion processes. We formulate
an entropy-regularized exploratory control problem with stochastic policies to
capture the exploration--exploitation balance essential for RL. Unlike the pure
diffusion case initially studied by Wang et al. (2020), the derivation of the
exploratory dynamics under jump-diffusions calls for a careful formulation of
the jump part. Through a theoretical analysis, we find that one can simply use
the same policy evaluation and q-learning algorithms in Jia and Zhou (2022a,
2023), originally developed for controlled diffusions, without needing to check
a priori whether the underlying data come from a pure diffusion or a
jump-diffusion. However, we show that the presence of jumps ought to affect
parameterizations of actors and critics in general. Finally, we investigate as
an application the mean-variance portfolio selection problem with stock price
modelled as a jump-diffusion, and show that both RL algorithms and
parameterizations are invariant with respect to jumps.