Na Miao, Mengke Yang, Pingping Han, Jiakun Qiao, Zhaoxuan Che, Fangjun Xu, Xiangyu Dai, Mengjin Zhu
{"title":"A new ensemble learning method stratified sampling blending optimizes conventional blending and improves prediction performance.","authors":"Na Miao, Mengke Yang, Pingping Han, Jiakun Qiao, Zhaoxuan Che, Fangjun Xu, Xiangyu Dai, Mengjin Zhu","doi":"10.1093/bioadv/vbaf002","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Ensemble learning, as a powerful machine learning method, improves overall prediction performance by combining the prediction results of multiple base models. Blending, as a popular ensemble learning method, can train multiple base models, input the resulting prediction results to further train meta model and obtain final prediction results. However, conventional blending divides the training set by simple random sampling, which causes bias and large variance, thus affecting the stability and accuracy of prediction performance. In this study, we propose a new algorithm of stratified sampling blending (ssBlending), which addresses the algorithm instability of conventional blending caused by the random partition of the training set, further improving the prediction accuracy.</p><p><strong>Results: </strong>We used multiple genotype data sets from different species including animal (pig), plant (loblolly pine), and microorganism (yeast) to test the prediction performance of ssBlending. The across-species multi-dataset verification results reveal that ssBlending is superior to conventional blending in terms of prediction accuracy and stability. In addition, we optimized the training set sampling rate (BestH) to facilitate the practical application of the ssBlending algorithm. In summary, this study proposes a completely new algorithm combing stratification strategy with the conventional blending, which provides more options for ensemble learning in various fields.</p><p><strong>Availability and implementation: </strong>https://figshare.com/s/23122a18dc8a35f12ff6.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf002"},"PeriodicalIF":2.4000,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11908643/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Motivation: Ensemble learning, as a powerful machine learning method, improves overall prediction performance by combining the prediction results of multiple base models. Blending, as a popular ensemble learning method, can train multiple base models, input the resulting prediction results to further train meta model and obtain final prediction results. However, conventional blending divides the training set by simple random sampling, which causes bias and large variance, thus affecting the stability and accuracy of prediction performance. In this study, we propose a new algorithm of stratified sampling blending (ssBlending), which addresses the algorithm instability of conventional blending caused by the random partition of the training set, further improving the prediction accuracy.
Results: We used multiple genotype data sets from different species including animal (pig), plant (loblolly pine), and microorganism (yeast) to test the prediction performance of ssBlending. The across-species multi-dataset verification results reveal that ssBlending is superior to conventional blending in terms of prediction accuracy and stability. In addition, we optimized the training set sampling rate (BestH) to facilitate the practical application of the ssBlending algorithm. In summary, this study proposes a completely new algorithm combing stratification strategy with the conventional blending, which provides more options for ensemble learning in various fields.
Availability and implementation: https://figshare.com/s/23122a18dc8a35f12ff6.