{"title":"Policy Gradient Learning Methods for Stochastic Control with Exit Time and Applications to Share Repurchase Pricing","authors":"Mohamed Hamdouche, P. Henry-Labordère, H. Pham","doi":"10.1080/1350486X.2023.2239850","DOIUrl":null,"url":null,"abstract":"ABSTRACT We develop policy gradients methods for stochastic control with exit time in a model-free setting. We propose two types of algorithms for learning either directly the optimal policy or by learning alternately the value function (critic) and the optimal control (actor). The use of randomized policies is crucial for overcoming notably the issue related to the exit time in the gradient computation. We demonstrate the effectiveness of our approach by implementing our numerical schemes in the application to the problem of share repurchase pricing. Our results show that the proposed policy gradient methods outperform PDE or other neural networks techniques in a model-based setting. Furthermore, our algorithms are flexible enough to incorporate realistic market conditions like, e.g., price impact or transaction costs.","PeriodicalId":35818,"journal":{"name":"Applied Mathematical Finance","volume":"7 1","pages":"439 - 456"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Mathematical Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/1350486X.2023.2239850","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 1
Abstract
ABSTRACT We develop policy gradients methods for stochastic control with exit time in a model-free setting. We propose two types of algorithms for learning either directly the optimal policy or by learning alternately the value function (critic) and the optimal control (actor). The use of randomized policies is crucial for overcoming notably the issue related to the exit time in the gradient computation. We demonstrate the effectiveness of our approach by implementing our numerical schemes in the application to the problem of share repurchase pricing. Our results show that the proposed policy gradient methods outperform PDE or other neural networks techniques in a model-based setting. Furthermore, our algorithms are flexible enough to incorporate realistic market conditions like, e.g., price impact or transaction costs.
期刊介绍:
The journal encourages the confident use of applied mathematics and mathematical modelling in finance. The journal publishes papers on the following: •modelling of financial and economic primitives (interest rates, asset prices etc); •modelling market behaviour; •modelling market imperfections; •pricing of financial derivative securities; •hedging strategies; •numerical methods; •financial engineering.