{"title":"A semi-heterogeneous ensemble forecasting method for stock returns based on sentiment analysis","authors":"Xiao Zhang , Peide Liu , Jing Feng","doi":"10.1016/j.ins.2025.122655","DOIUrl":null,"url":null,"abstract":"<div><div>With the growing influence of investor sentiment on market dynamics, sentiment analysis has emerged as an effective tool for enhancing financial forecasting models. This study proposes a diversity-enhanced semi-heterogeneous ensemble forecasting framework that integrates sentiment analysis into the forecasting of stock index returns. A supervised stock market sentiment index set is constructed, in which prior knowledge regarding term importance is integrated into the data augmentation process. This enables higher weights to be assigned to sentiment-related terms with superior predictive capacity, thereby allowing the model to prioritize more informative features and enhance its forecasting performance. A series of diverse base models are generated through the integration of multiple attention-PCA techniques and forecasting algorithms based on variable perturbation strategies. These base models are subsequently combined through a suite of ensemble strategies, forming a semi-heterogeneous ensemble model for forecasting S&P 500 returns. The experiment results demonstrate that the proposed approaches significantly outperform benchmark methods, with notable improvements in both accuracy and diversity.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"723 ","pages":"Article 122655"},"PeriodicalIF":6.8000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525007881","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
With the growing influence of investor sentiment on market dynamics, sentiment analysis has emerged as an effective tool for enhancing financial forecasting models. This study proposes a diversity-enhanced semi-heterogeneous ensemble forecasting framework that integrates sentiment analysis into the forecasting of stock index returns. A supervised stock market sentiment index set is constructed, in which prior knowledge regarding term importance is integrated into the data augmentation process. This enables higher weights to be assigned to sentiment-related terms with superior predictive capacity, thereby allowing the model to prioritize more informative features and enhance its forecasting performance. A series of diverse base models are generated through the integration of multiple attention-PCA techniques and forecasting algorithms based on variable perturbation strategies. These base models are subsequently combined through a suite of ensemble strategies, forming a semi-heterogeneous ensemble model for forecasting S&P 500 returns. The experiment results demonstrate that the proposed approaches significantly outperform benchmark methods, with notable improvements in both accuracy and diversity.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.