基于夏普比率奖励框架的投资组合管理强化学习

Z. Liu
{"title":"基于夏普比率奖励框架的投资组合管理强化学习","authors":"Z. Liu","doi":"10.4108/eai.18-11-2022.2327121","DOIUrl":null,"url":null,"abstract":"— Portfolio management is a financial operation which aims at maximizing the return or optimizing the Sharpe Ratio. One widely used portfolio management strategy, Mean-Variance Optimization, also known as Modern Portfolio Theory, mainly profits by focusing on finding out the expected return and variance of stocks based on historical data to maximize Sharpe Ratio. Yet, it is not easy and accurate to simply predict future return and variance based on a formula. So, in this paper, two Models-free framework, Sharpe Ratio reward based Deep Q-Network (DQN-S) and Return reward (DQN-R) are proposed to overcome the limitations above. Deep Q-learning was employed to train a neural network to manage a stock portfolio of 10 stocks. Stock price was defined as environment of NN, weight of portfolio was defined as action of neural network agent, and reward was indicated to train the model. Traditional portfolio allocation strategy Mean Variance Optimization (MVO) and Naïve Portfolio Allocation (NPA) were also introduced as benchmark to evaluate the performance of reinforcement learning. Moreover, the extensiveness of DQN-S was discussed. The result shows that the MVO is dominating the NPA with a 5% higher annual return and 0.5 higher of Sharpe ratio, although the MDD is slightly higher, indicating the superiority of Sharpe Ratio oriented strategy.","PeriodicalId":436941,"journal":{"name":"Proceedings of the 4th International Conference on Economic Management and Model Engineering, ICEMME 2022, November 18-20, 2022, Nanjing, China","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement Learning in Portfolio Management with Sharpe Ratio Rewarding Based Framework\",\"authors\":\"Z. Liu\",\"doi\":\"10.4108/eai.18-11-2022.2327121\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"— Portfolio management is a financial operation which aims at maximizing the return or optimizing the Sharpe Ratio. One widely used portfolio management strategy, Mean-Variance Optimization, also known as Modern Portfolio Theory, mainly profits by focusing on finding out the expected return and variance of stocks based on historical data to maximize Sharpe Ratio. Yet, it is not easy and accurate to simply predict future return and variance based on a formula. So, in this paper, two Models-free framework, Sharpe Ratio reward based Deep Q-Network (DQN-S) and Return reward (DQN-R) are proposed to overcome the limitations above. Deep Q-learning was employed to train a neural network to manage a stock portfolio of 10 stocks. Stock price was defined as environment of NN, weight of portfolio was defined as action of neural network agent, and reward was indicated to train the model. Traditional portfolio allocation strategy Mean Variance Optimization (MVO) and Naïve Portfolio Allocation (NPA) were also introduced as benchmark to evaluate the performance of reinforcement learning. Moreover, the extensiveness of DQN-S was discussed. The result shows that the MVO is dominating the NPA with a 5% higher annual return and 0.5 higher of Sharpe ratio, although the MDD is slightly higher, indicating the superiority of Sharpe Ratio oriented strategy.\",\"PeriodicalId\":436941,\"journal\":{\"name\":\"Proceedings of the 4th International Conference on Economic Management and Model Engineering, ICEMME 2022, November 18-20, 2022, Nanjing, China\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 4th International Conference on Economic Management and Model Engineering, ICEMME 2022, November 18-20, 2022, Nanjing, China\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4108/eai.18-11-2022.2327121\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Economic Management and Model Engineering, ICEMME 2022, November 18-20, 2022, Nanjing, China","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4108/eai.18-11-2022.2327121","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

-投资组合管理是一种财务操作,其目的是最大化回报或优化夏普比率。一种被广泛使用的投资组合管理策略是Mean-Variance Optimization,也被称为Modern portfolio Theory,主要是通过根据历史数据找出股票的预期收益和方差来最大化Sharpe Ratio。然而,简单地根据公式预测未来的收益和方差是不容易和准确的。因此,本文提出了基于Sharpe Ratio奖励的深度Q-Network (DQN-S)和Return reward (DQN-R)两种无模型框架来克服上述局限性。使用深度q -学习来训练一个神经网络来管理一个由10只股票组成的股票组合。将股票价格定义为神经网络的环境,将投资组合的权重定义为神经网络代理的行为,并通过奖励来训练模型。引入传统的投资组合配置策略均值方差优化(Mean Variance Optimization, MVO)和Naïve投资组合配置(portfolio allocation, NPA)作为评价强化学习性能的基准。此外,还讨论了DQN-S的广泛性。结果表明,MVO主导着NPA,年化收益率高出5%,夏普比率高出0.5,但MDD略高,说明夏普比率导向策略具有优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Reinforcement Learning in Portfolio Management with Sharpe Ratio Rewarding Based Framework
— Portfolio management is a financial operation which aims at maximizing the return or optimizing the Sharpe Ratio. One widely used portfolio management strategy, Mean-Variance Optimization, also known as Modern Portfolio Theory, mainly profits by focusing on finding out the expected return and variance of stocks based on historical data to maximize Sharpe Ratio. Yet, it is not easy and accurate to simply predict future return and variance based on a formula. So, in this paper, two Models-free framework, Sharpe Ratio reward based Deep Q-Network (DQN-S) and Return reward (DQN-R) are proposed to overcome the limitations above. Deep Q-learning was employed to train a neural network to manage a stock portfolio of 10 stocks. Stock price was defined as environment of NN, weight of portfolio was defined as action of neural network agent, and reward was indicated to train the model. Traditional portfolio allocation strategy Mean Variance Optimization (MVO) and Naïve Portfolio Allocation (NPA) were also introduced as benchmark to evaluate the performance of reinforcement learning. Moreover, the extensiveness of DQN-S was discussed. The result shows that the MVO is dominating the NPA with a 5% higher annual return and 0.5 higher of Sharpe ratio, although the MDD is slightly higher, indicating the superiority of Sharpe Ratio oriented strategy.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信