{"title":"Reinforcement learning with dynamic convex risk measures","authors":"Anthony Coache, Sebastian Jaimungal","doi":"10.1111/mafi.12388","DOIUrl":"10.1111/mafi.12388","url":null,"abstract":"<p>We develop an approach for solving time-consistent risk-sensitive stochastic optimization problems using model-free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules that aid in obtaining optimal policies. We further develop an actor–critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to three optimization problems: statistical arbitrage trading strategies, financial hedging, and obstacle avoidance robot control.</p>","PeriodicalId":49867,"journal":{"name":"Mathematical Finance","volume":"34 2","pages":"557-587"},"PeriodicalIF":1.6,"publicationDate":"2023-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/mafi.12388","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73631321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trading with the crowd","authors":"Eyal Neuman, Moritz Voß","doi":"10.1111/mafi.12390","DOIUrl":"10.1111/mafi.12390","url":null,"abstract":"<p>We formulate and solve a multi-player stochastic differential game between financial agents who seek to cost-efficiently liquidate their position in a risky asset in the presence of jointly aggregated transient price impact, along with taking into account a common general price predicting signal. The unique Nash-equilibrium strategies reveal how each agent's liquidation policy adjusts the predictive trading signal to the aggregated transient price impact induced by all other agents. This unfolds a quantitative relation between trading signals and the order flow in crowded markets. We also formulate and solve the corresponding mean field game in the limit of infinitely many agents. We prove that the equilibrium trading speed and the value function of an agent in the finite <i>N</i>-player game converges to the corresponding trading speed and value function in the mean field game at rate <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>O</mi>\u0000 <mo>(</mo>\u0000 <msup>\u0000 <mi>N</mi>\u0000 <mrow>\u0000 <mo>−</mo>\u0000 <mn>2</mn>\u0000 </mrow>\u0000 </msup>\u0000 <mo>)</mo>\u0000 </mrow>\u0000 <annotation>$O(N^{-2})$</annotation>\u0000 </semantics></math>. In addition, we prove that the mean field optimal strategy provides an approximate Nash-equilibrium for the finite-player game.</p>","PeriodicalId":49867,"journal":{"name":"Mathematical Finance","volume":"33 3","pages":"548-617"},"PeriodicalIF":1.6,"publicationDate":"2023-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/mafi.12390","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45517220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recent advances in reinforcement learning in finance","authors":"Ben Hambly, Renyuan Xu, Huining Yang","doi":"10.1111/mafi.12382","DOIUrl":"10.1111/mafi.12382","url":null,"abstract":"<p>The rapid changes in the finance industry due to the increasing amount of data have revolutionized the techniques on data processing and data analysis and brought new theoretical and computational challenges. In contrast to classical stochastic control theory and other analytical approaches for solving financial decision-making problems that heavily reply on model assumptions, new developments from reinforcement learning (RL) are able to make full use of the large amount of financial data with fewer model assumptions and to improve decisions in complex financial environments. This survey paper aims to review the recent developments and use of RL approaches in finance. We give an introduction to Markov decision processes, which is the setting for many of the commonly used RL approaches. Various algorithms are then introduced with a focus on value- and policy-based methods that do not require any model assumptions. Connections are made with neural networks to extend the framework to encompass deep RL algorithms. We then discuss in detail the application of these RL algorithms in a variety of decision-making problems in finance, including optimal execution, portfolio optimization, option pricing and hedging, market making, smart order routing, and robo-advising. Our survey concludes by pointing out a few possible future directions for research.</p>","PeriodicalId":49867,"journal":{"name":"Mathematical Finance","volume":"33 3","pages":"437-503"},"PeriodicalIF":1.6,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/mafi.12382","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47028350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analytical solvability and exact simulation in models with affine stochastic volatility and Lévy jumps","authors":"Pingping Zeng, Ziqing Xu, Pingping Jiang, Yue Kuen Kwok","doi":"10.1111/mafi.12387","DOIUrl":"10.1111/mafi.12387","url":null,"abstract":"<p>We investigate analytical solvability of models with affine stochastic volatility (SV) and Lévy jumps by deriving a unified formula for the conditional moment generating function of the log-asset price and providing the condition under which this new formula is explicit. The results lay a foundation for a range of valuation, calibration, and econometric problems. We then combine our theoretical results, the Hilbert transform method, various interpolation techniques, with the dimension reduction technique to propose unified simulation schemes for solvable models with affine SV and Lévy jumps. In contrast to traditional exact simulation methods, our approach is applicable to a broad class of models, maintains good accuracy, and enables efficient pricing of discretely monitored path-dependent derivatives. We analyze various sources of errors arising from the simulation approach and present error bounds. Finally, extensive numerical results demonstrate that our method is highly accurate, efficient, simple to implement, and widely applicable.</p>","PeriodicalId":49867,"journal":{"name":"Mathematical Finance","volume":"33 3","pages":"842-890"},"PeriodicalIF":1.6,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43824061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Equilibria of time-inconsistent stopping for one-dimensional diffusion processes","authors":"Erhan Bayraktar, Zhenhua Wang, Zhou Zhou","doi":"10.1111/mafi.12385","DOIUrl":"10.1111/mafi.12385","url":null,"abstract":"<p>We consider three equilibrium concepts proposed in the literature for time-inconsistent stopping problems, including mild equilibria (introduced in Huang and Nguyen-Huu (2018)), weak equilibria (introduced in Christensen and Lindensjö (2018)), and strong equilibria (introduced in Bayraktar et al. (2021)). The discount function is assumed to be log subadditive and the underlying process is one-dimensional diffusion. We first provide necessary and sufficient conditions for the characterization of weak equilibria. The smooth-fit condition is obtained as a by-product. Next, based on the characterization of weak equilibria, we show that an optimal mild equilibrium is also weak. Then we provide conditions under which a weak equilibrium is strong. We further show that an optimal mild equilibrium is also strong under a certain condition. Finally, we provide several examples including one showing a weak equilibrium may not be strong, and another one showing a strong equilibrium may not be optimal mild.</p>","PeriodicalId":49867,"journal":{"name":"Mathematical Finance","volume":"33 3","pages":"797-841"},"PeriodicalIF":1.6,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/mafi.12385","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44453743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junchi Ma, Mobolaji Ogunsolu, Jinniao Qiu, Ayşe Deniz Sezer
{"title":"Credit risk pricing in a consumption-based equilibrium framework with incomplete accounting information","authors":"Junchi Ma, Mobolaji Ogunsolu, Jinniao Qiu, Ayşe Deniz Sezer","doi":"10.1111/mafi.12386","DOIUrl":"10.1111/mafi.12386","url":null,"abstract":"<p>We present a consumption-based equilibrium framework for credit risk pricing based on the Epstein–Zin (EZ) preferences where the default time is modeled as the first hitting time of a default boundary and bond investors have imperfect/partial information about the firm value. The imperfect information is generated by the underlying observed state variables and a noisy observation process of the firm value. In addition, the consumption, the volatility, and the firm value process are modeled to follow affine diffusion processes. Using the EZ equilibrium solution as the pricing kernel, we provide an equivalent pricing measure to compute the prices of financial derivatives as discounted values of the future payoffs given the incomplete information. The price of a zero-coupon bond is represented in terms of the solutions of a <i>stochastic</i> partial differential equation (SPDE) and a <i>deterministic</i> PDE; the self-contained proofs are provided for both this representation and the well-posedness of the involved SPDE. Furthermore, this SPDE is numerically solved, which yields some insights into the relationship between the structure of the yield spreads and the model parameters.</p>","PeriodicalId":49867,"journal":{"name":"Mathematical Finance","volume":"33 3","pages":"666-708"},"PeriodicalIF":1.6,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/mafi.12386","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46688591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Noncausal affine processes with applications to derivative pricing","authors":"Christian Gouriéroux, Yang Lu","doi":"10.1111/mafi.12384","DOIUrl":"10.1111/mafi.12384","url":null,"abstract":"<p>Linear factor models, where the factors are affine processes, play a key role in Finance, since they allow for quasi-closed form expressions of the term structure of risks. We introduce the class of noncausal affine linear factor models by considering factors that are affine in reverse time. These models are especially relevant for pricing sequences of speculative bubbles. We show that they feature nonaffine dynamics in calendar time, while still providing (quasi) closed form term structures and derivative pricing formulas. The framework is illustrated with term structure of interest rates and European call option pricing examples.</p>","PeriodicalId":49867,"journal":{"name":"Mathematical Finance","volume":"33 3","pages":"766-796"},"PeriodicalIF":1.6,"publicationDate":"2023-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47358183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effective algorithms for optimal portfolio deleveraging problem with cross impact","authors":"Hezhi Luo, Yuanyuan Chen, Xianye Zhang, Duan Li, Huixian Wu","doi":"10.1111/mafi.12383","DOIUrl":"10.1111/mafi.12383","url":null,"abstract":"<p>We investigate the optimal portfolio deleveraging (OPD) problem with permanent and temporary price impacts, where the objective is to maximize equity while meeting a prescribed debt/equity requirement. We take the real situation with cross impact among different assets into consideration. The resulting problem is, however, a nonconvex quadratic program with a quadratic constraint and a box constraint, which is known to be NP-hard. In this paper, we first develop a successive convex optimization (SCO) approach for solving the OPD problem and show that the SCO algorithm converges to a KKT point of its transformed problem. Second, we propose an effective global algorithm for the OPD problem, which integrates the SCO method, simple convex relaxation, and a branch-and-bound framework, to identify a global optimal solution to the OPD problem within a prespecified ε-tolerance. We establish the global convergence of our algorithm and estimate its complexity. We also conduct numerical experiments to demonstrate the effectiveness of our proposed algorithms with both real data and randomly generated medium- and large-scale OPD instances.</p>","PeriodicalId":49867,"journal":{"name":"Mathematical Finance","volume":"34 1","pages":"36-89"},"PeriodicalIF":1.6,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42650971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A general approximation method for optimal stopping and random delay","authors":"Pengzhan Chen, Yingda Song","doi":"10.1111/mafi.12380","DOIUrl":"10.1111/mafi.12380","url":null,"abstract":"<p>This study examines the continuous-time optimal stopping problem with an infinite horizon under Markov processes. Existing research focuses on finding explicit solutions under certain assumptions of the reward function or underlying process; however, these assumptions may either not be fulfilled or be difficult to validate in practice. We developed a continuous-time Markov chain (CTMC) approximation method to find the optimal solution, which applies to general reward functions and underlying Markov processes. We demonstrated that our method can be used to solve the optimal stopping problem with a random delay, in which the delay could be either an independent random variable or a function of the underlying process. We established a theoretical upper bound for the approximation error to facilitate error control. Furthermore, we designed a two-stage scheme to implement our method efficiently. The numerical results show that the proposed method is accurate and rapid under various model specifications.</p>","PeriodicalId":49867,"journal":{"name":"Mathematical Finance","volume":"34 1","pages":"5-35"},"PeriodicalIF":1.6,"publicationDate":"2023-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47079969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Markov decision processes under model uncertainty","authors":"Ariel Neufeld, Julian Sester, Mario Šikić","doi":"10.1111/mafi.12381","DOIUrl":"10.1111/mafi.12381","url":null,"abstract":"<p>We introduce a general framework for Markov decision problems under model uncertainty in a discrete-time infinite horizon setting. By providing a dynamic programming principle, we obtain a local-to-global paradigm, namely solving a local, that is, a one time-step robust optimization problem leads to an optimizer of the global (i.e., infinite time-steps) robust stochastic optimal control problem, as well as to a corresponding worst-case measure. Moreover, we apply this framework to portfolio optimization involving data of the <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>S</mi>\u0000 <mo>&</mo>\u0000 <mi>P</mi>\u0000 <mspace></mspace>\u0000 <mn>500</mn>\u0000 </mrow>\u0000 <annotation>$S&Pnobreakspace 500$</annotation>\u0000 </semantics></math>. We present two different types of ambiguity sets; one is fully data-driven given by a Wasserstein-ball around the empirical measure, the second one is described by a parametric set of multivariate normal distributions, where the corresponding uncertainty sets of the parameters are estimated from the data. It turns out that in scenarios where the market is volatile or bearish, the optimal portfolio strategies from the corresponding robust optimization problem outperforms the ones without model uncertainty, showcasing the importance of taking model uncertainty into account.</p>","PeriodicalId":49867,"journal":{"name":"Mathematical Finance","volume":"33 3","pages":"618-665"},"PeriodicalIF":1.6,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43218515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}