{"title":"Deviations from the Nash equilibrium and emergence of tacit collusion in a two-player optimal execution game with reinforcement learning","authors":"Fabrizio Lillo, Andrea Macrì","doi":"arxiv-2408.11773","DOIUrl":null,"url":null,"abstract":"The use of reinforcement learning algorithms in financial trading is becoming\nincreasingly prevalent. However, the autonomous nature of these algorithms can\nlead to unexpected outcomes that deviate from traditional game-theoretical\npredictions and may even destabilize markets. In this study, we examine a\nscenario in which two autonomous agents, modeled with Double Deep Q-Learning,\nlearn to liquidate the same asset optimally in the presence of market impact,\nusing the Almgren-Chriss (2000) framework. Our results show that the strategies\nlearned by the agents deviate significantly from the Nash equilibrium of the\ncorresponding market impact game. Notably, the learned strategies exhibit tacit\ncollusion, closely aligning with the Pareto-optimal solution. We further\nexplore how different levels of market volatility influence the agents'\nperformance and the equilibria they discover, including scenarios where\nvolatility differs between the training and testing phases.","PeriodicalId":501273,"journal":{"name":"arXiv - ECON - General Economics","volume":"143 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - ECON - General Economics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.11773","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The use of reinforcement learning algorithms in financial trading is becoming
increasingly prevalent. However, the autonomous nature of these algorithms can
lead to unexpected outcomes that deviate from traditional game-theoretical
predictions and may even destabilize markets. In this study, we examine a
scenario in which two autonomous agents, modeled with Double Deep Q-Learning,
learn to liquidate the same asset optimally in the presence of market impact,
using the Almgren-Chriss (2000) framework. Our results show that the strategies
learned by the agents deviate significantly from the Nash equilibrium of the
corresponding market impact game. Notably, the learned strategies exhibit tacit
collusion, closely aligning with the Pareto-optimal solution. We further
explore how different levels of market volatility influence the agents'
performance and the equilibria they discover, including scenarios where
volatility differs between the training and testing phases.