基于C51算法的风险敏感投资组合管理

IF 0.8 4区综合性期刊 Q3 MULTIDISCIPLINARY SCIENCES

Chiang Mai Journal of Science Pub Date : 2022-09-30 DOI:10.12982/cmjs.2022.094

Thammasorn Harnpadungkij, Warasinee Chaisangmongkon, P. Phunchongharn

{"title":"基于C51算法的风险敏感投资组合管理","authors":"Thammasorn Harnpadungkij, Warasinee Chaisangmongkon, P. Phunchongharn","doi":"10.12982/cmjs.2022.094","DOIUrl":null,"url":null,"abstract":"Financial trading is one of the most popular problems for reinforcement learning in recent years. One of the important challenges is that investment is a multi-objective problem. That is, professional investors do not act solely on expected profi t but also carefully consider the potential risk of a given investment. To handle such a challenge, previous studies have explored various kinds of risk-sensitive rewards, for example, the Sharpe ratio as computed by a fi xed length of previous returns. This work proposes a new approach to deal with the profi t-to-risk tradeoff by applying distributional reinforcement learning to build a risk awareness policy instead of a simple risk-based reward function. Our new policy, termed C51-Sharpe, is to select the action based on the Sharpe ratio computed from the probability mass function of the return. This produces a signifi cantly higher Sharpe ratio and lower maximum drawdown without sacrifi cing profi t compared to the C51algorithm utilizing a purely profi t-based policy. Moreover, it can outperform other benchmarks, such as a Deep Q-Network (DQN) with a Sharpe ratio reward function. Besides the policy, we also studied the effect of using double networks and the choice of exploration strategies with our approach to identify the optimal training confi guration. We fi nd that the epsilon-greedy policy is the most suitable exploration for C51-Sharpe and that the use of double network has no signifi cant impact on performance. Our study provides statistical evidence of the effi ciency in risk-sensitive policy implemented by using distributional reinforcement algorithms along with an optimized training process.","PeriodicalId":9884,"journal":{"name":"Chiang Mai Journal of Science","volume":"13 1","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2022-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Risk-Sensitive Portfolio Management by Using C51 Algorithm\",\"authors\":\"Thammasorn Harnpadungkij, Warasinee Chaisangmongkon, P. Phunchongharn\",\"doi\":\"10.12982/cmjs.2022.094\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Financial trading is one of the most popular problems for reinforcement learning in recent years. One of the important challenges is that investment is a multi-objective problem. That is, professional investors do not act solely on expected profi t but also carefully consider the potential risk of a given investment. To handle such a challenge, previous studies have explored various kinds of risk-sensitive rewards, for example, the Sharpe ratio as computed by a fi xed length of previous returns. This work proposes a new approach to deal with the profi t-to-risk tradeoff by applying distributional reinforcement learning to build a risk awareness policy instead of a simple risk-based reward function. Our new policy, termed C51-Sharpe, is to select the action based on the Sharpe ratio computed from the probability mass function of the return. This produces a signifi cantly higher Sharpe ratio and lower maximum drawdown without sacrifi cing profi t compared to the C51algorithm utilizing a purely profi t-based policy. Moreover, it can outperform other benchmarks, such as a Deep Q-Network (DQN) with a Sharpe ratio reward function. Besides the policy, we also studied the effect of using double networks and the choice of exploration strategies with our approach to identify the optimal training confi guration. We fi nd that the epsilon-greedy policy is the most suitable exploration for C51-Sharpe and that the use of double network has no signifi cant impact on performance. Our study provides statistical evidence of the effi ciency in risk-sensitive policy implemented by using distributional reinforcement algorithms along with an optimized training process.\",\"PeriodicalId\":9884,\"journal\":{\"name\":\"Chiang Mai Journal of Science\",\"volume\":\"13 1\",\"pages\":\"\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2022-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chiang Mai Journal of Science\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.12982/cmjs.2022.094\",\"RegionNum\":4,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chiang Mai Journal of Science","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.12982/cmjs.2022.094","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

摘要

金融交易是近年来强化学习最热门的问题之一。其中一个重要的挑战是，投资是一个多目标问题。也就是说，专业投资者不仅根据预期利润行事，而且还会仔细考虑特定投资的潜在风险。为了应对这样的挑战，以前的研究已经探索了各种风险敏感型回报，例如，由固定长度的先前回报计算的夏普比率。这项工作提出了一种新的方法来处理利润与风险的权衡，通过应用分布式强化学习来构建风险意识策略，而不是简单的基于风险的奖励函数。我们的新策略，称为C51-Sharpe，是根据从收益的概率质量函数计算出的夏普比率来选择行动。与使用纯粹基于利润的策略的c51算法相比，这在不牺牲利润的情况下产生了明显更高的夏普比率和更低的最大下降。此外，它可以优于其他基准测试，例如具有夏普比率奖励函数的深度Q-Network (DQN)。除了策略之外，我们还研究了使用双网络的效果和探索策略的选择，以确定最优的训练配置。我们发现epsilon-greedy策略是最适合C51-Sharpe的探索策略，双网络的使用对性能没有显著影响。我们的研究为使用分布式强化算法和优化的训练过程实现风险敏感策略的效率提供了统计证据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Risk-Sensitive Portfolio Management by Using C51 Algorithm

Financial trading is one of the most popular problems for reinforcement learning in recent years. One of the important challenges is that investment is a multi-objective problem. That is, professional investors do not act solely on expected profi t but also carefully consider the potential risk of a given investment. To handle such a challenge, previous studies have explored various kinds of risk-sensitive rewards, for example, the Sharpe ratio as computed by a fi xed length of previous returns. This work proposes a new approach to deal with the profi t-to-risk tradeoff by applying distributional reinforcement learning to build a risk awareness policy instead of a simple risk-based reward function. Our new policy, termed C51-Sharpe, is to select the action based on the Sharpe ratio computed from the probability mass function of the return. This produces a signifi cantly higher Sharpe ratio and lower maximum drawdown without sacrifi cing profi t compared to the C51algorithm utilizing a purely profi t-based policy. Moreover, it can outperform other benchmarks, such as a Deep Q-Network (DQN) with a Sharpe ratio reward function. Besides the policy, we also studied the effect of using double networks and the choice of exploration strategies with our approach to identify the optimal training confi guration. We fi nd that the epsilon-greedy policy is the most suitable exploration for C51-Sharpe and that the use of double network has no signifi cant impact on performance. Our study provides statistical evidence of the effi ciency in risk-sensitive policy implemented by using distributional reinforcement algorithms along with an optimized training process.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Chiang Mai Journal of Science MULTIDISCIPLINARY SCIENCES-

CiteScore

1.00

自引率

25.00%

发文量

103

审稿时长

3 months

期刊介绍： The Chiang Mai Journal of Science is an international English language peer-reviewed journal which is published in open access electronic format 6 times a year in January, March, May, July, September and November by the Faculty of Science, Chiang Mai University. Manuscripts in most areas of science are welcomed except in areas such as agriculture, engineering and medical science which are outside the scope of the Journal. Currently, we focus on manuscripts in biology, chemistry, physics, materials science and environmental science. Papers in mathematics statistics and computer science are also included but should be of an applied nature rather than purely theoretical. Manuscripts describing experiments on humans or animals are required to provide proof that all experiments have been carried out according to the ethical regulations of the respective institutional and/or governmental authorities and this should be clearly stated in the manuscript itself. The Editor reserves the right to reject manuscripts that fail to do so.