使用强化学习的套期保值:语境k-武装强盗与q -学习

Q1 Mathematics
Loris Cannelli , Giuseppe Nuti , Marzio Sala , Oleg Szehr
{"title":"使用强化学习的套期保值:语境k-武装强盗与q -学习","authors":"Loris Cannelli ,&nbsp;Giuseppe Nuti ,&nbsp;Marzio Sala ,&nbsp;Oleg Szehr","doi":"10.1016/j.jfds.2023.100101","DOIUrl":null,"url":null,"abstract":"<div><p>The construction of replication strategies for contingent claims in the presence of risk and market friction is a key problem of financial engineering. In real markets, continuous replication, such as in the model of Black, Scholes and Merton (BSM), is not only unrealistic but is also undesirable due to high transaction costs. A variety of methods have been proposed to balance between effective replication and losses in the incomplete market setting. With the rise of Artificial Intelligence (AI), AI-based hedgers have attracted considerable interest, where particular attention is given to Recurrent Neural Network systems and variations of the <em>Q</em>-learning algorithm. From a practical point of view, sufficient samples for training such an AI can only be obtained from a simulator of the market environment. Yet if an agent is trained solely on simulated data, the run-time performance will primarily reflect the accuracy of the simulation, which leads to the classical problem of model choice and calibration. In this article, the hedging problem is viewed as an instance of a risk-averse contextual <em>k</em>-armed bandit problem, which is motivated by the simplicity and sample-efficiency of the architecture, which allows for realistic online model updates from real-world data. We find that the <em>k</em>-armed bandit model naturally fits to the Profit and Loss formulation of hedging, providing for a more accurate and sample efficient approach than <em>Q</em>-learning and reducing to the Black-Scholes model in the absence of transaction costs and risks.</p></div>","PeriodicalId":36340,"journal":{"name":"Journal of Finance and Data Science","volume":"9 ","pages":"Article 100101"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Hedging using reinforcement learning: Contextual k-armed bandit versus Q-learning\",\"authors\":\"Loris Cannelli ,&nbsp;Giuseppe Nuti ,&nbsp;Marzio Sala ,&nbsp;Oleg Szehr\",\"doi\":\"10.1016/j.jfds.2023.100101\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The construction of replication strategies for contingent claims in the presence of risk and market friction is a key problem of financial engineering. In real markets, continuous replication, such as in the model of Black, Scholes and Merton (BSM), is not only unrealistic but is also undesirable due to high transaction costs. A variety of methods have been proposed to balance between effective replication and losses in the incomplete market setting. With the rise of Artificial Intelligence (AI), AI-based hedgers have attracted considerable interest, where particular attention is given to Recurrent Neural Network systems and variations of the <em>Q</em>-learning algorithm. From a practical point of view, sufficient samples for training such an AI can only be obtained from a simulator of the market environment. Yet if an agent is trained solely on simulated data, the run-time performance will primarily reflect the accuracy of the simulation, which leads to the classical problem of model choice and calibration. In this article, the hedging problem is viewed as an instance of a risk-averse contextual <em>k</em>-armed bandit problem, which is motivated by the simplicity and sample-efficiency of the architecture, which allows for realistic online model updates from real-world data. We find that the <em>k</em>-armed bandit model naturally fits to the Profit and Loss formulation of hedging, providing for a more accurate and sample efficient approach than <em>Q</em>-learning and reducing to the Black-Scholes model in the absence of transaction costs and risks.</p></div>\",\"PeriodicalId\":36340,\"journal\":{\"name\":\"Journal of Finance and Data Science\",\"volume\":\"9 \",\"pages\":\"Article 100101\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Finance and Data Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S240591882300017X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Finance and Data Science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S240591882300017X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 6

摘要

存在风险和市场摩擦的或有债权复制策略的构建是金融工程中的一个关键问题。在现实市场中,像Black, Scholes和Merton (BSM)模型那样的持续复制不仅不现实,而且由于交易成本高,也是不可取的。人们提出了多种方法来平衡不完全市场环境下的有效复制和损失。随着人工智能(AI)的兴起,基于AI的套期保值引起了相当大的兴趣,其中特别关注递归神经网络系统和q -学习算法的变体。从实际的角度来看,训练这样一个人工智能的足够样本只能从市场环境的模拟器中获得。然而,如果智能体仅在模拟数据上进行训练,则运行时性能将主要反映模拟的准确性,这将导致经典的模型选择和校准问题。在本文中,对冲问题被视为风险规避上下文k-武装强盗问题的一个实例,其动机是架构的简单性和样本效率,它允许从现实世界的数据进行现实的在线模型更新。我们发现,k臂强盗模型自然地适合对冲的损益公式,提供了比q学习更准确和样本效率更高的方法,并在缺乏交易成本和风险的情况下简化为布莱克-斯科尔斯模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Hedging using reinforcement learning: Contextual k-armed bandit versus Q-learning

The construction of replication strategies for contingent claims in the presence of risk and market friction is a key problem of financial engineering. In real markets, continuous replication, such as in the model of Black, Scholes and Merton (BSM), is not only unrealistic but is also undesirable due to high transaction costs. A variety of methods have been proposed to balance between effective replication and losses in the incomplete market setting. With the rise of Artificial Intelligence (AI), AI-based hedgers have attracted considerable interest, where particular attention is given to Recurrent Neural Network systems and variations of the Q-learning algorithm. From a practical point of view, sufficient samples for training such an AI can only be obtained from a simulator of the market environment. Yet if an agent is trained solely on simulated data, the run-time performance will primarily reflect the accuracy of the simulation, which leads to the classical problem of model choice and calibration. In this article, the hedging problem is viewed as an instance of a risk-averse contextual k-armed bandit problem, which is motivated by the simplicity and sample-efficiency of the architecture, which allows for realistic online model updates from real-world data. We find that the k-armed bandit model naturally fits to the Profit and Loss formulation of hedging, providing for a more accurate and sample efficient approach than Q-learning and reducing to the Black-Scholes model in the absence of transaction costs and risks.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Finance and Data Science
Journal of Finance and Data Science Mathematics-Statistics and Probability
CiteScore
3.90
自引率
0.00%
发文量
15
审稿时长
30 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信