双深度q -学习优化执行

Q3 Mathematics

Applied Mathematical Finance Pub Date : 2018-12-17 DOI:10.1080/1350486X.2022.2077783

Brian Ning, Franco Ho Ting Ling, S. Jaimungal

{"title":"双深度q -学习优化执行","authors":"Brian Ning, Franco Ho Ting Ling, S. Jaimungal","doi":"10.1080/1350486X.2022.2077783","DOIUrl":null,"url":null,"abstract":"ABSTRACT Optimal trade execution is an important problem faced by essentially all traders. Much research into optimal execution uses stringent model assumptions and applies continuous time stochastic control to solve them. Here, we instead take a model free approach and develop a variation of Deep Q-Learning to estimate the optimal actions of a trader. The model is a fully connected Neural Network trained using Experience Replay and Double DQN with input features given by the current state of the limit order book, other trading signals, and available execution actions, while the output is the Q-value function estimating the future rewards under an arbitrary action. We apply our model to nine different stocks and find that it outperforms the standard benchmark approach on most stocks using the measures of (i) mean and median out-performance, (ii) probability of out-performance, and (iii) gain-loss ratios.","PeriodicalId":35818,"journal":{"name":"Applied Mathematical Finance","volume":"1 1","pages":"361 - 380"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"52","resultStr":"{\"title\":\"Double Deep Q-Learning for Optimal Execution\",\"authors\":\"Brian Ning, Franco Ho Ting Ling, S. Jaimungal\",\"doi\":\"10.1080/1350486X.2022.2077783\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT Optimal trade execution is an important problem faced by essentially all traders. Much research into optimal execution uses stringent model assumptions and applies continuous time stochastic control to solve them. Here, we instead take a model free approach and develop a variation of Deep Q-Learning to estimate the optimal actions of a trader. The model is a fully connected Neural Network trained using Experience Replay and Double DQN with input features given by the current state of the limit order book, other trading signals, and available execution actions, while the output is the Q-value function estimating the future rewards under an arbitrary action. We apply our model to nine different stocks and find that it outperforms the standard benchmark approach on most stocks using the measures of (i) mean and median out-performance, (ii) probability of out-performance, and (iii) gain-loss ratios.\",\"PeriodicalId\":35818,\"journal\":{\"name\":\"Applied Mathematical Finance\",\"volume\":\"1 1\",\"pages\":\"361 - 380\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"52\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Mathematical Finance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/1350486X.2022.2077783\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Mathematical Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/1350486X.2022.2077783","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 52

摘要

最优交易执行基本上是所有交易者都面临的一个重要问题。许多关于最优执行的研究都采用严格的模型假设，并采用连续时间随机控制来解决这些问题。在这里，我们采用了一种无模型的方法，并开发了一种深度q学习的变体来估计交易者的最佳行为。该模型是一个使用Experience Replay和Double DQN训练的全连接神经网络，其输入特征由限价单的当前状态、其他交易信号和可用的执行动作给出，而输出是估计任意动作下未来奖励的q值函数。我们将我们的模型应用于9只不同的股票，发现它在大多数股票上的表现优于标准基准方法，使用(i)平均和中位数表现，(ii)表现优异的概率，以及(iii)损益比。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Double Deep Q-Learning for Optimal Execution

ABSTRACT Optimal trade execution is an important problem faced by essentially all traders. Much research into optimal execution uses stringent model assumptions and applies continuous time stochastic control to solve them. Here, we instead take a model free approach and develop a variation of Deep Q-Learning to estimate the optimal actions of a trader. The model is a fully connected Neural Network trained using Experience Replay and Double DQN with input features given by the current state of the limit order book, other trading signals, and available execution actions, while the output is the Q-value function estimating the future rewards under an arbitrary action. We apply our model to nine different stocks and find that it outperforms the standard benchmark approach on most stocks using the measures of (i) mean and median out-performance, (ii) probability of out-performance, and (iii) gain-loss ratios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Mathematical Finance Economics, Econometrics and Finance-Finance

CiteScore

2.30

自引率

0.00%

发文量

期刊介绍： The journal encourages the confident use of applied mathematics and mathematical modelling in finance. The journal publishes papers on the following: •modelling of financial and economic primitives (interest rates, asset prices etc); •modelling market behaviour; •modelling market imperfections; •pricing of financial derivative securities; •hedging strategies; •numerical methods; •financial engineering.