Multi-agent reinforcement learning in a realistic limit order book market simulation

Proceedings of the First ACM International Conference on AI in Finance Pub Date : 2020-06-10 DOI:10.1145/3383455.3422570

Michael Karpe, Jin Fang, Zhongyao Ma, Chen Wang

{"title":"Multi-agent reinforcement learning in a realistic limit order book market simulation","authors":"Michael Karpe, Jin Fang, Zhongyao Ma, Chen Wang","doi":"10.1145/3383455.3422570","DOIUrl":null,"url":null,"abstract":"Optimal order execution is widely studied by industry practitioners and academic researchers because it determines the profitability of investment decisions and high-level trading strategies, particularly those involving large volumes of orders. However, complex and unknown market dynamics pose significant challenges for the development and validation of optimal execution strategies. In this paper, we propose a model-free approach by training Reinforcement Learning (RL) agents in a realistic market simulation environment with multiple agents. First, we configure a multi-agent historical order book simulation environment for execution tasks built on an Agent-Based Interactive Discrete Event Simulation (ABIDES) [6]. Second, we formulate the problem of optimal execution in an RL setting where an intelligent agent can make order execution and placement decisions based on market microstructure trading signals in High Frequency Trading (HFT). Third, we develop and train an RL execution agent using the Double Deep Q-Learning (DDQL) algorithm in the ABIDES environment. In some scenarios, our RL agent converges towards a Time-Weighted Average Price (TWAP) strategy. Finally, we evaluate the simulation with our RL agent by comparing it with a market replay simulation using real market Limit Order Book (LOB) data.","PeriodicalId":447950,"journal":{"name":"Proceedings of the First ACM International Conference on AI in Finance","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the First ACM International Conference on AI in Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3383455.3422570","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 27

Abstract

Optimal order execution is widely studied by industry practitioners and academic researchers because it determines the profitability of investment decisions and high-level trading strategies, particularly those involving large volumes of orders. However, complex and unknown market dynamics pose significant challenges for the development and validation of optimal execution strategies. In this paper, we propose a model-free approach by training Reinforcement Learning (RL) agents in a realistic market simulation environment with multiple agents. First, we configure a multi-agent historical order book simulation environment for execution tasks built on an Agent-Based Interactive Discrete Event Simulation (ABIDES) [6]. Second, we formulate the problem of optimal execution in an RL setting where an intelligent agent can make order execution and placement decisions based on market microstructure trading signals in High Frequency Trading (HFT). Third, we develop and train an RL execution agent using the Double Deep Q-Learning (DDQL) algorithm in the ABIDES environment. In some scenarios, our RL agent converges towards a Time-Weighted Average Price (TWAP) strategy. Finally, we evaluate the simulation with our RL agent by comparing it with a market replay simulation using real market Limit Order Book (LOB) data.

查看原文本刊更多论文

多智能体强化学习在现实限价订单市场模拟中的应用

最优订单执行被行业从业者和学术研究人员广泛研究，因为它决定了投资决策和高级交易策略的盈利能力，特别是那些涉及大量订单的交易。然而，复杂和未知的市场动态对最佳执行策略的制定和验证提出了重大挑战。在本文中，我们提出了一种无模型的方法，通过在具有多个智能体的现实市场模拟环境中训练强化学习(RL)智能体。首先，我们为基于代理的交互式离散事件仿真(ABIDES)[6]上的执行任务配置了一个多代理历史订单模拟环境。其次，我们在RL设置中制定了最优执行问题，其中智能代理可以根据高频交易(HFT)中的市场微观结构交易信号做出订单执行和放置决策。第三，我们在ABIDES环境中使用双深度Q-Learning (DDQL)算法开发和训练RL执行代理。在某些情况下，我们的RL代理会趋向于时间加权平均价格(TWAP)策略。最后，我们通过将我们的RL代理与使用真实市场限价订单(LOB)数据的市场重播模拟进行比较来评估模拟。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the First ACM International Conference on AI in Finance

自引率

0.00%

发文量