QuantFactor REINFORCE: Mining Steady Formulaic Alpha Factors With Variance-Bounded REINFORCE

IF 5.8 2区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Signal Processing Pub Date : 2025-06-04 DOI:10.1109/TSP.2025.3576781

Junjie Zhao;Chengxi Zhang;Min Qin;Peng Yang

{"title":"QuantFactor REINFORCE: Mining Steady Formulaic Alpha Factors With Variance-Bounded REINFORCE","authors":"Junjie Zhao;Chengxi Zhang;Min Qin;Peng Yang","doi":"10.1109/TSP.2025.3576781","DOIUrl":null,"url":null,"abstract":"Alpha factor mining aims to discover investment signals from the historical financial market data, which can be used to predict asset returns and gain excess profits. Powerful deep learning methods for alpha factor mining lack interpretability, making them unacceptable in the risk-sensitive real markets. Formulaic alpha factors are preferred for their interpretability, while the search space is complex and powerful explorative methods are urged. Recently, a promising framework is proposed for generating formulaic alpha factors using deep reinforcement learning, and quickly gained research focuses from both academia and industries. This paper first argues that the originally employed policy training method, i.e., Proximal Policy Optimization (PPO), faces several important issues in the context of alpha factors mining. Herein, a novel reinforcement learning algorithm based on the well-known REINFORCE algorithm is proposed. REINFORCE employs Monte Carlo sampling to estimate the policy gradient—yielding unbiased but high variance estimates. The minimal environmental variability inherent in the underlying state transition function, which adheres to the Dirac distribution, can help alleviate this high variance issue, making REINFORCE algorithm more appropriate than PPO. A new dedicated baseline is designed to theoretically reduce the commonly suffered high variance of REINFORCE. Moreover, the information ratio is introduced as a reward shaping mechanism to encourage the generation of steady alpha factors that can better adapt to changes in market volatility. Evaluations on real assets data indicate the proposed algorithm boosts correlation with returns by 3.83%, and a stronger ability to obtain excess returns compared to the latest alpha factors mining methods, which meets the theoretical results well.","PeriodicalId":13330,"journal":{"name":"IEEE Transactions on Signal Processing","volume":"73 ","pages":"2448-2463"},"PeriodicalIF":5.8000,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11024173/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Alpha factor mining aims to discover investment signals from the historical financial market data, which can be used to predict asset returns and gain excess profits. Powerful deep learning methods for alpha factor mining lack interpretability, making them unacceptable in the risk-sensitive real markets. Formulaic alpha factors are preferred for their interpretability, while the search space is complex and powerful explorative methods are urged. Recently, a promising framework is proposed for generating formulaic alpha factors using deep reinforcement learning, and quickly gained research focuses from both academia and industries. This paper first argues that the originally employed policy training method, i.e., Proximal Policy Optimization (PPO), faces several important issues in the context of alpha factors mining. Herein, a novel reinforcement learning algorithm based on the well-known REINFORCE algorithm is proposed. REINFORCE employs Monte Carlo sampling to estimate the policy gradient—yielding unbiased but high variance estimates. The minimal environmental variability inherent in the underlying state transition function, which adheres to the Dirac distribution, can help alleviate this high variance issue, making REINFORCE algorithm more appropriate than PPO. A new dedicated baseline is designed to theoretically reduce the commonly suffered high variance of REINFORCE. Moreover, the information ratio is introduced as a reward shaping mechanism to encourage the generation of steady alpha factors that can better adapt to changes in market volatility. Evaluations on real assets data indicate the proposed algorithm boosts correlation with returns by 3.83%, and a stronger ability to obtain excess returns compared to the latest alpha factors mining methods, which meets the theoretical results well.

查看原文本刊更多论文

定量因子强化：用方差有界强化挖掘稳定公式α因子

阿尔法因子挖掘的目的是从历史金融市场数据中发现投资信号，用于预测资产收益，获取超额利润。用于α因子挖掘的强大深度学习方法缺乏可解释性，这使得它们在风险敏感的真实市场中不可接受。公式化α因子因其可解释性而受到青睐，而搜索空间复杂，需要强有力的探索方法。最近，人们提出了一种利用深度强化学习生成公式化α因子的框架，并迅速获得了学术界和工业界的研究重点。本文首先认为，在alpha因子挖掘的背景下，最初采用的策略训练方法，即近端策略优化（PPO），面临着几个重要问题。在此基础上，提出了一种新的强化学习算法。强化算法采用蒙特卡罗抽样来估计策略梯度——产生无偏但方差高的估计。底层状态转换函数中固有的最小环境可变性符合Dirac分布，可以帮助缓解这种高方差问题，使强化算法比PPO更合适。设计了一种新的专用基线，从理论上降低了普遍存在的高配筋方差。此外，引入信息比率作为奖励形成机制，鼓励稳定α因子的产生，从而更好地适应市场波动的变化。对实际资产数据的评价表明，与最新的alpha因子挖掘方法相比，本文算法与收益的相关性提高了3.83%，获得超额收益的能力更强，与理论结果吻合较好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Signal Processing 工程技术-工程：电子与电气

CiteScore

11.20

自引率

9.30%

发文量

310

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Signal Processing covers novel theory, algorithms, performance analyses and applications of techniques for the processing, understanding, learning, retrieval, mining, and extraction of information from signals. The term “signal” includes, among others, audio, video, speech, image, communication, geophysical, sonar, radar, medical and musical signals. Examples of topics of interest include, but are not limited to, information processing and the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals.