Stock Market Trading Agent Using On-Policy Reinforcement Learning Algorithms

CompSciRN: Artificial Intelligence (Topic) Pub Date : 2020-04-21 DOI:10.2139/ssrn.3582014

Shreyas Lele, Kavit Gangar, Harsh Daftary, Dewashish Dharkar

{"title":"Stock Market Trading Agent Using On-Policy Reinforcement Learning Algorithms","authors":"Shreyas Lele, Kavit Gangar, Harsh Daftary, Dewashish Dharkar","doi":"10.2139/ssrn.3582014","DOIUrl":null,"url":null,"abstract":"Stock market has been a complex system which has been difficult to predict for humans, thereby, making the trading decisions difficult to take. It will be useful for traders if there is a model agent which can learn the stock market trends and suggest trading decisions which in turn maximizes the profits. Inorder to develop this agent we have formulated the problem as a Markov Decision Process (MDP) and created a stock trading environment which serves as a platform for this agent to trade the stocks. In this paper, we introduce a Reinforcement Learning based approach to develop a trading agent which performs trading actions on the environment and learns according to the rewards in terms of profit or loss it receives. We have applied different On-policy Reinforcement Learning Algorithms such as Vanilla Policy Gradient (VPG), Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) on the environment to obtain the profits while trading stocks for 3 companies viz. Apple, Microsoft and Nike. The performance of these algorithms in order to maximize the profits have been evaluated and the results and conclusions have been elaborated.","PeriodicalId":241211,"journal":{"name":"CompSciRN: Artificial Intelligence (Topic)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CompSciRN: Artificial Intelligence (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3582014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Stock market has been a complex system which has been difficult to predict for humans, thereby, making the trading decisions difficult to take. It will be useful for traders if there is a model agent which can learn the stock market trends and suggest trading decisions which in turn maximizes the profits. Inorder to develop this agent we have formulated the problem as a Markov Decision Process (MDP) and created a stock trading environment which serves as a platform for this agent to trade the stocks. In this paper, we introduce a Reinforcement Learning based approach to develop a trading agent which performs trading actions on the environment and learns according to the rewards in terms of profit or loss it receives. We have applied different On-policy Reinforcement Learning Algorithms such as Vanilla Policy Gradient (VPG), Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) on the environment to obtain the profits while trading stocks for 3 companies viz. Apple, Microsoft and Nike. The performance of these algorithms in order to maximize the profits have been evaluated and the results and conclusions have been elaborated.

查看原文本刊更多论文

基于策略强化学习算法的股票市场交易代理

股票市场是一个复杂的系统，对人类来说很难预测，因此很难做出交易决策。如果有一个模型代理可以学习股票市场趋势，并提出交易决策，从而实现利润最大化，这对交易者来说将是有用的。为了开发该代理，我们将问题表述为马尔可夫决策过程(MDP)，并创建了一个股票交易环境，作为该代理交易股票的平台。在本文中，我们引入了一种基于强化学习的方法来开发一个交易代理，该代理在环境中执行交易行为，并根据其收到的利润或损失奖励进行学习。我们在环境上应用了不同的策略强化学习算法，如香草策略梯度(VPG)，信任区域策略优化(TRPO)和近端策略优化(PPO)，以获得苹果，微软和耐克3家公司股票交易时的利润。以利润最大化为目标，对这些算法的性能进行了评价，并对结果和结论进行了阐述。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

CompSciRN: Artificial Intelligence (Topic)

自引率

0.00%

发文量