Deep Reinforcement Learning for Finance and the Efficient Market Hypothesis

Capital Markets: Market Efficiency eJournal Pub Date : 2021-06-11 DOI:10.2139/ssrn.3865019

L. Odermatt, Jetmir Beqiraj, Joerg Osterrieder

{"title":"Deep Reinforcement Learning for Finance and the Efficient Market Hypothesis","authors":"L. Odermatt, Jetmir Beqiraj, Joerg Osterrieder","doi":"10.2139/ssrn.3865019","DOIUrl":null,"url":null,"abstract":"Is there an informational gain by training a Deep Reinforcement Learning agent for automated stock trading using other time series than the one to be traded? In this work, we implement a DRL algorithm in a solid framework within a model-free and actor-critic approach and learn it with 21 global Multi Assets to predict and trade on the S&P 500. The Efficient Market Hypothesis sets out that it is impossible to gather more information from the broader input. We demand to learn a DRL agent on this index with and without the additional information of these several Multi Assets to determine if the agent could capture invisible dependencies to end up with an informational gain and a better performance.<br>The aim of this work is not to tune the hyperparameters of a DRL agent; several papers already exist on this subject. Nevertheless, we use a proven setup as model architecture. We take a Multi Layer Perceptron (short: MLP) as the neural network architecture with two hidden layers and 64 neurons each layer. The activation function used is the hyperbolic tangent. Further, Proximal Policy Optimization (short: PPO) is used as the policy for simple implementation and enabling a continuous state space. To deal with uncertainties of neural nets, we learn 100 agents for each scenario and compared both results. Neither the Sharpe ratios nor the cumulative returns are better in the more complex approach with the additional information of the Multi Assets, and even the single approach performed marginally better. However, we demonstrate that the complexly learned agent delivers less scattering over the 100 simulations in terms of the risk-adjusted returns, so there is an informational gain due to Multi Assets. A DRL agent learned with additional information delivers more robust results compared to the taken risk. We deliver valuable results for the further development of Deep Reinforcement Learning and provide a unique and resourceful approach.","PeriodicalId":260048,"journal":{"name":"Capital Markets: Market Efficiency eJournal","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Capital Markets: Market Efficiency eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3865019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Is there an informational gain by training a Deep Reinforcement Learning agent for automated stock trading using other time series than the one to be traded? In this work, we implement a DRL algorithm in a solid framework within a model-free and actor-critic approach and learn it with 21 global Multi Assets to predict and trade on the S&P 500. The Efficient Market Hypothesis sets out that it is impossible to gather more information from the broader input. We demand to learn a DRL agent on this index with and without the additional information of these several Multi Assets to determine if the agent could capture invisible dependencies to end up with an informational gain and a better performance.
The aim of this work is not to tune the hyperparameters of a DRL agent; several papers already exist on this subject. Nevertheless, we use a proven setup as model architecture. We take a Multi Layer Perceptron (short: MLP) as the neural network architecture with two hidden layers and 64 neurons each layer. The activation function used is the hyperbolic tangent. Further, Proximal Policy Optimization (short: PPO) is used as the policy for simple implementation and enabling a continuous state space. To deal with uncertainties of neural nets, we learn 100 agents for each scenario and compared both results. Neither the Sharpe ratios nor the cumulative returns are better in the more complex approach with the additional information of the Multi Assets, and even the single approach performed marginally better. However, we demonstrate that the complexly learned agent delivers less scattering over the 100 simulations in terms of the risk-adjusted returns, so there is an informational gain due to Multi Assets. A DRL agent learned with additional information delivers more robust results compared to the taken risk. We deliver valuable results for the further development of Deep Reinforcement Learning and provide a unique and resourceful approach.

查看原文本刊更多论文

金融的深度强化学习与有效市场假说

通过训练深度强化学习代理使用其他时间序列而不是交易的时间序列进行自动股票交易，是否有信息增益?在这项工作中，我们在无模型和行动者批评方法的坚实框架中实现了一个DRL算法，并使用21个全球多资产来学习它，以预测和交易标准普尔500指数。有效市场假说认为，从更广泛的输入中收集更多信息是不可能的。我们需要在这个索引上学习一个DRL代理，在有和没有这几个Multi Assets的附加信息的情况下，以确定代理是否可以捕获不可见的依赖关系，从而获得信息增益和更好的性能。这项工作的目的不是调优DRL代理的超参数;关于这个问题已有几篇论文。然而，我们使用一个经过验证的设置作为模型架构。我们采用多层感知器(简称:MLP)作为神经网络架构，它有两个隐藏层，每层64个神经元。使用的激活函数是双曲正切。此外，为了简单实现和支持连续状态空间，还使用了Proximal Policy Optimization(简称PPO)作为策略。为了处理神经网络的不确定性，我们为每个场景学习了100个智能体，并比较了两种结果。夏普比率和累积收益在包含多资产附加信息的更复杂的方法中都不是更好的，甚至单一方法的表现也略好一些。然而，我们证明，就风险调整收益而言，复杂学习的智能体在100次模拟中提供了更少的散射，因此由于多资产而存在信息增益。与承担的风险相比，通过附加信息学习的DRL代理提供了更稳健的结果。我们为深度强化学习的进一步发展提供了有价值的结果，并提供了一种独特而机智的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Capital Markets: Market Efficiency eJournal

自引率

0.00%

发文量