Benchmark Dataset for Short-Term Market Prediction of Limit Order Book in China Markets

The Journal of Financial Data Science Pub Date : 2021-09-05 DOI:10.3905/jfds.2021.1.074

Charles Huang, Weifeng Ge, Hongsong Chou, Xin Du

{"title":"Benchmark Dataset for Short-Term Market Prediction of Limit Order Book in China Markets","authors":"Charles Huang, Weifeng Ge, Hongsong Chou, Xin Du","doi":"10.3905/jfds.2021.1.074","DOIUrl":null,"url":null,"abstract":"Limit order books (LOBs) have generated big financial data for analysis and prediction from both academic community and industry practitioners. This article presents a benchmark LOB dataset from the Chinese stock market, covering a few thousand stocks for the period of June to September 2020. Experiment protocols are designed for model performance evaluation: at the end of every second, to forecast the upcoming volume-weighted average price change and volume over 12 horizons ranging from 1 second to 300 seconds. Results based on a linear regression model and deep learning models are compared. A practical short-term trading strategy framework based on the alpha signal generated is presented. The data and code are available on Github (github.com/HKGSAS). Key Findings ▪ There is a gap between benchmarking a high-frequency LOB dataset and model for researchers to objectively assess prediction performances, which this article serves to bridge. ▪ A more practically effective set of features is proposed to capture both LOB snapshots and periodic data. The prediction target is similarly too simplistic in the published literature—mid-price direction change for the next few events, which is not suitable for a practical trading strategy. The authors propose to predict the price change and volume magnitude over 12 short-term horizons. ▪ This article proposes comparing the performance of baseline linear regression and state-of-the-art deep learning models, based on both accuracy statistics and trading profits.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Financial Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3905/jfds.2021.1.074","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Limit order books (LOBs) have generated big financial data for analysis and prediction from both academic community and industry practitioners. This article presents a benchmark LOB dataset from the Chinese stock market, covering a few thousand stocks for the period of June to September 2020. Experiment protocols are designed for model performance evaluation: at the end of every second, to forecast the upcoming volume-weighted average price change and volume over 12 horizons ranging from 1 second to 300 seconds. Results based on a linear regression model and deep learning models are compared. A practical short-term trading strategy framework based on the alpha signal generated is presented. The data and code are available on Github (github.com/HKGSAS). Key Findings ▪ There is a gap between benchmarking a high-frequency LOB dataset and model for researchers to objectively assess prediction performances, which this article serves to bridge. ▪ A more practically effective set of features is proposed to capture both LOB snapshots and periodic data. The prediction target is similarly too simplistic in the published literature—mid-price direction change for the next few events, which is not suitable for a practical trading strategy. The authors propose to predict the price change and volume magnitude over 12 short-term horizons. ▪ This article proposes comparing the performance of baseline linear regression and state-of-the-art deep learning models, based on both accuracy statistics and trading profits.

查看原文本刊更多论文

中国市场限价单短期市场预测基准数据集

限价订单(lob)产生了大量的财务数据，可供学术界和行业从业者进行分析和预测。本文介绍了中国股市的基准LOB数据集，涵盖了2020年6月至9月期间的数千只股票。实验方案设计用于模型性能评估:在每秒钟结束时，预测1秒至300秒12个视界内即将到来的交易量加权平均价格变化和交易量。对基于线性回归模型和深度学习模型的结果进行了比较。提出了一种实用的基于α信号生成的短期交易策略框架。数据和代码可在Github (github.com/HKGSAS)上获得。▪对高频LOB数据集进行基准测试与研究人员客观评估预测性能的模型之间存在差距，本文旨在弥合这一差距。▪提出了一套更实际有效的功能来捕获LOB快照和周期性数据。在已发表的文献中，预测目标同样过于简单化——未来几个事件的中间价格方向变化，这并不适合实际的交易策略。作者建议在12个短期内预测价格变化和交易量。▪本文建议比较基线线性回归和最先进的深度学习模型的性能，基于准确性统计和交易利润。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The Journal of Financial Data Science

自引率

0.00%

发文量