Digger-Guider：用于股票趋势预测的高频因素提取

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2024-07-08 DOI:10.1109/TKDE.2024.3424475

Yang Liu;Chang Xu;Min Hou;Weiqing Liu;Jiang Bian;Qi Liu;Tie-Yan Liu

{"title":"Digger-Guider：用于股票趋势预测的高频因素提取","authors":"Yang Liu;Chang Xu;Min Hou;Weiqing Liu;Jiang Bian;Qi Liu;Tie-Yan Liu","doi":"10.1109/TKDE.2024.3424475","DOIUrl":null,"url":null,"abstract":"Recent years have witnessed increasing attention being paid to AI-based quantitative investment. Compared to traditional low-frequency data (e.g., daily, weekly), high-frequency data (e.g., minute-level) is often underutilized for low-frequency stock trend prediction, leaving the vast potential for improvement. However, valuable and noisy information coexist in high-frequency data. The learning process of high-frequency factor extractors can easily be overwhelmed by noise, leading to overfitting. Moreover, common techniques used to prevent overfitting often result in poor performance on this task since they usually roughly restrict the model’s capacity, making it challenging to model complex trading signals in high-frequency data. When designing high-frequency factor extractors, we face a tough dilemma. A high-capacity model may easily overfit to noise, while a simple but robust model may not capture complex high-frequency patterns. To address these problems, we propose maintaining model capacity while preventing overfitting by constructing two components that balance information and noise through interactions between them. Specifically, we propose a novel learning framework called \n<italic>Digger-Guider\n to extract informative stock representations from noisy high-frequency data. We develop a high-capacity model called \n<italic>Digger\n to extract local and detailed features from the high-frequency data, and we design a robust model called \n<italic>Guider\n to capture global tendency features and help the Digger overcome the noise. The Digger and Guider enhance each other through mutual distillation during training, serving as data-driven regularizations that work well on this task. Extensive experiments on real-world datasets demonstrate that our framework can produce powerful high-frequency stock factors that significantly improve stock trend prediction performance and our understanding of the finance market.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"7973-7985"},"PeriodicalIF":8.9000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Digger-Guider: High-Frequency Factor Extraction for Stock Trend Prediction\",\"authors\":\"Yang Liu;Chang Xu;Min Hou;Weiqing Liu;Jiang Bian;Qi Liu;Tie-Yan Liu\",\"doi\":\"10.1109/TKDE.2024.3424475\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent years have witnessed increasing attention being paid to AI-based quantitative investment. Compared to traditional low-frequency data (e.g., daily, weekly), high-frequency data (e.g., minute-level) is often underutilized for low-frequency stock trend prediction, leaving the vast potential for improvement. However, valuable and noisy information coexist in high-frequency data. The learning process of high-frequency factor extractors can easily be overwhelmed by noise, leading to overfitting. Moreover, common techniques used to prevent overfitting often result in poor performance on this task since they usually roughly restrict the model’s capacity, making it challenging to model complex trading signals in high-frequency data. When designing high-frequency factor extractors, we face a tough dilemma. A high-capacity model may easily overfit to noise, while a simple but robust model may not capture complex high-frequency patterns. To address these problems, we propose maintaining model capacity while preventing overfitting by constructing two components that balance information and noise through interactions between them. Specifically, we propose a novel learning framework called \\n<italic>Digger-Guider\\n to extract informative stock representations from noisy high-frequency data. We develop a high-capacity model called \\n<italic>Digger\\n to extract local and detailed features from the high-frequency data, and we design a robust model called \\n<italic>Guider\\n to capture global tendency features and help the Digger overcome the noise. The Digger and Guider enhance each other through mutual distillation during training, serving as data-driven regularizations that work well on this task. Extensive experiments on real-world datasets demonstrate that our framework can produce powerful high-frequency stock factors that significantly improve stock trend prediction performance and our understanding of the finance market.\",\"PeriodicalId\":13496,\"journal\":{\"name\":\"IEEE Transactions on Knowledge and Data Engineering\",\"volume\":\"36 12\",\"pages\":\"7973-7985\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2024-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Knowledge and Data Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10589270/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10589270/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

近年来，基于人工智能的量化投资越来越受到关注。与传统的低频数据（如日频、周频）相比，高频数据（如分钟级）在低频股票走势预测中往往未得到充分利用，存在巨大的改进潜力。然而，在高频数据中，有价值的信息和噪声信息并存。高频因子提取器的学习过程很容易被噪声淹没，导致过度拟合。此外，用于防止过拟合的常用技术往往会导致在这项任务中表现不佳，因为这些技术通常会粗略限制模型的容量，从而给高频数据中复杂的交易信号建模带来挑战。在设计高频因子提取器时，我们面临着一个棘手的难题。高容量模型可能很容易过度拟合噪声，而简单但稳健的模型可能无法捕捉复杂的高频模式。为了解决这些问题，我们提出了在保持模型容量的同时防止过拟合的方法，即构建两个组件，通过它们之间的相互作用来平衡信息和噪声。具体来说，我们提出了一种名为 Digger-Guider 的新型学习框架，用于从嘈杂的高频数据中提取有信息量的股票表征。我们开发了一个名为 Digger 的高容量模型，用于从高频数据中提取局部和细节特征；我们还设计了一个名为 Guider 的稳健模型，用于捕捉全局趋势特征，帮助 Digger 克服噪声。Digger 和 Guider 在训练过程中通过相互提炼相互促进，作为数据驱动的正则化，在这项任务中发挥了良好的作用。在真实世界数据集上进行的广泛实验证明，我们的框架可以产生强大的高频股票因子，从而显著提高股票趋势预测性能和我们对金融市场的理解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Digger-Guider: High-Frequency Factor Extraction for Stock Trend Prediction

Recent years have witnessed increasing attention being paid to AI-based quantitative investment. Compared to traditional low-frequency data (e.g., daily, weekly), high-frequency data (e.g., minute-level) is often underutilized for low-frequency stock trend prediction, leaving the vast potential for improvement. However, valuable and noisy information coexist in high-frequency data. The learning process of high-frequency factor extractors can easily be overwhelmed by noise, leading to overfitting. Moreover, common techniques used to prevent overfitting often result in poor performance on this task since they usually roughly restrict the model’s capacity, making it challenging to model complex trading signals in high-frequency data. When designing high-frequency factor extractors, we face a tough dilemma. A high-capacity model may easily overfit to noise, while a simple but robust model may not capture complex high-frequency patterns. To address these problems, we propose maintaining model capacity while preventing overfitting by constructing two components that balance information and noise through interactions between them. Specifically, we propose a novel learning framework called Digger-Guider to extract informative stock representations from noisy high-frequency data. We develop a high-capacity model called Digger to extract local and detailed features from the high-frequency data, and we design a robust model called Guider to capture global tendency features and help the Digger overcome the noise. The Digger and Guider enhance each other through mutual distillation during training, serving as data-driven regularizations that work well on this task. Extensive experiments on real-world datasets demonstrate that our framework can produce powerful high-frequency stock factors that significantly improve stock trend prediction performance and our understanding of the finance market.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Knowledge and Data Engineering 工程技术-工程：电子与电气

CiteScore

11.70

自引率

3.40%

发文量

515

审稿时长

6 months

期刊介绍： The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.