{"title":"QuantFactor REINFORCE:利用有方差限制的 REINFORCE 挖掘稳定的公式化阿尔法因子","authors":"Junjie Zhao, Chengxi Zhang, Min Qin, Peng Yang","doi":"arxiv-2409.05144","DOIUrl":null,"url":null,"abstract":"The goal of alpha factor mining is to discover indicative signals of\ninvestment opportunities from the historical financial market data of assets.\nDeep learning based alpha factor mining methods have shown to be powerful,\nwhich, however, lack of the interpretability, making them unacceptable in the\nrisk-sensitive real markets. Alpha factors in formulaic forms are more\ninterpretable and therefore favored by market participants, while the search\nspace is complex and powerful explorative methods are urged. Recently, a\npromising framework is proposed for generating formulaic alpha factors using\ndeep reinforcement learning, and quickly gained research focuses from both\nacademia and industries. This paper first argues that the originally employed\npolicy training method, i.e., Proximal Policy Optimization (PPO), faces several\nimportant issues in the context of alpha factors mining, making it ineffective\nto explore the search space of the formula. Herein, a novel reinforcement\nlearning based on the well-known REINFORCE algorithm is proposed. Given that\nthe underlying state transition function adheres to the Dirac distribution, the\nMarkov Decision Process within this framework exhibit minimal environmental\nvariability, making REINFORCE algorithm more appropriate than PPO. A new\ndedicated baseline is designed to theoretically reduce the commonly suffered\nhigh variance of REINFORCE. Moreover, the information ratio is introduced as a\nreward shaping mechanism to encourage the generation of steady alpha factors\nthat can better adapt to changes in market volatility. Experimental evaluations\non various real assets data show that the proposed algorithm can increase the\ncorrelation with asset returns by 3.83%, and a stronger ability to obtain\nexcess returns compared to the latest alpha factors mining methods, which meets\nthe theoretical results well.","PeriodicalId":501294,"journal":{"name":"arXiv - QuantFin - Computational Finance","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"QuantFactor REINFORCE: Mining Steady Formulaic Alpha Factors with Variance-bounded REINFORCE\",\"authors\":\"Junjie Zhao, Chengxi Zhang, Min Qin, Peng Yang\",\"doi\":\"arxiv-2409.05144\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The goal of alpha factor mining is to discover indicative signals of\\ninvestment opportunities from the historical financial market data of assets.\\nDeep learning based alpha factor mining methods have shown to be powerful,\\nwhich, however, lack of the interpretability, making them unacceptable in the\\nrisk-sensitive real markets. Alpha factors in formulaic forms are more\\ninterpretable and therefore favored by market participants, while the search\\nspace is complex and powerful explorative methods are urged. Recently, a\\npromising framework is proposed for generating formulaic alpha factors using\\ndeep reinforcement learning, and quickly gained research focuses from both\\nacademia and industries. This paper first argues that the originally employed\\npolicy training method, i.e., Proximal Policy Optimization (PPO), faces several\\nimportant issues in the context of alpha factors mining, making it ineffective\\nto explore the search space of the formula. Herein, a novel reinforcement\\nlearning based on the well-known REINFORCE algorithm is proposed. Given that\\nthe underlying state transition function adheres to the Dirac distribution, the\\nMarkov Decision Process within this framework exhibit minimal environmental\\nvariability, making REINFORCE algorithm more appropriate than PPO. A new\\ndedicated baseline is designed to theoretically reduce the commonly suffered\\nhigh variance of REINFORCE. Moreover, the information ratio is introduced as a\\nreward shaping mechanism to encourage the generation of steady alpha factors\\nthat can better adapt to changes in market volatility. Experimental evaluations\\non various real assets data show that the proposed algorithm can increase the\\ncorrelation with asset returns by 3.83%, and a stronger ability to obtain\\nexcess returns compared to the latest alpha factors mining methods, which meets\\nthe theoretical results well.\",\"PeriodicalId\":501294,\"journal\":{\"name\":\"arXiv - QuantFin - Computational Finance\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuantFin - Computational Finance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.05144\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Computational Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05144","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
QuantFactor REINFORCE: Mining Steady Formulaic Alpha Factors with Variance-bounded REINFORCE
The goal of alpha factor mining is to discover indicative signals of
investment opportunities from the historical financial market data of assets.
Deep learning based alpha factor mining methods have shown to be powerful,
which, however, lack of the interpretability, making them unacceptable in the
risk-sensitive real markets. Alpha factors in formulaic forms are more
interpretable and therefore favored by market participants, while the search
space is complex and powerful explorative methods are urged. Recently, a
promising framework is proposed for generating formulaic alpha factors using
deep reinforcement learning, and quickly gained research focuses from both
academia and industries. This paper first argues that the originally employed
policy training method, i.e., Proximal Policy Optimization (PPO), faces several
important issues in the context of alpha factors mining, making it ineffective
to explore the search space of the formula. Herein, a novel reinforcement
learning based on the well-known REINFORCE algorithm is proposed. Given that
the underlying state transition function adheres to the Dirac distribution, the
Markov Decision Process within this framework exhibit minimal environmental
variability, making REINFORCE algorithm more appropriate than PPO. A new
dedicated baseline is designed to theoretically reduce the commonly suffered
high variance of REINFORCE. Moreover, the information ratio is introduced as a
reward shaping mechanism to encourage the generation of steady alpha factors
that can better adapt to changes in market volatility. Experimental evaluations
on various real assets data show that the proposed algorithm can increase the
correlation with asset returns by 3.83%, and a stronger ability to obtain
excess returns compared to the latest alpha factors mining methods, which meets
the theoretical results well.