Catastrophic-risk-aware reinforcement learning with extreme-value-theory-based policy gradients

Parisa Davar, Frédéric Godin, Jose Garrido
{"title":"Catastrophic-risk-aware reinforcement learning with extreme-value-theory-based policy gradients","authors":"Parisa Davar, Frédéric Godin, Jose Garrido","doi":"arxiv-2406.15612","DOIUrl":null,"url":null,"abstract":"This paper tackles the problem of mitigating catastrophic risk (which is risk\nwith very low frequency but very high severity) in the context of a sequential\ndecision making process. This problem is particularly challenging due to the\nscarcity of observations in the far tail of the distribution of cumulative\ncosts (negative rewards). A policy gradient algorithm is developed, that we\ncall POTPG. It is based on approximations of the tail risk derived from extreme\nvalue theory. Numerical experiments highlight the out-performance of our method\nover common benchmarks, relying on the empirical distribution. An application\nto financial risk management, more precisely to the dynamic hedging of a\nfinancial option, is presented.","PeriodicalId":501128,"journal":{"name":"arXiv - QuantFin - Risk Management","volume":"72 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Risk Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.15612","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper tackles the problem of mitigating catastrophic risk (which is risk with very low frequency but very high severity) in the context of a sequential decision making process. This problem is particularly challenging due to the scarcity of observations in the far tail of the distribution of cumulative costs (negative rewards). A policy gradient algorithm is developed, that we call POTPG. It is based on approximations of the tail risk derived from extreme value theory. Numerical experiments highlight the out-performance of our method over common benchmarks, relying on the empirical distribution. An application to financial risk management, more precisely to the dynamic hedging of a financial option, is presented.
基于极值理论策略梯度的灾难风险意识强化学习
本文探讨了在连续决策过程中降低灾难性风险(即频率很低但严重程度很高的风险)的问题。由于累积成本(负回报)分布远端观测值的稀缺性,这个问题尤其具有挑战性。我们开发了一种策略梯度算法,我们称之为 POTPG。该算法基于极值理论得出的尾部风险近似值。数值实验表明,我们的方法优于依赖经验分布的普通基准。本文介绍了该方法在金融风险管理中的应用,更确切地说,是在金融期权动态对冲中的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信