A new ensemble learning method stratified sampling blending optimizes conventional blending and improves prediction performance.

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Bioinformatics advances Pub Date : 2025-02-22 eCollection Date: 2025-01-01 DOI:10.1093/bioadv/vbaf002
Na Miao, Mengke Yang, Pingping Han, Jiakun Qiao, Zhaoxuan Che, Fangjun Xu, Xiangyu Dai, Mengjin Zhu
{"title":"A new ensemble learning method stratified sampling blending optimizes conventional blending and improves prediction performance.","authors":"Na Miao, Mengke Yang, Pingping Han, Jiakun Qiao, Zhaoxuan Che, Fangjun Xu, Xiangyu Dai, Mengjin Zhu","doi":"10.1093/bioadv/vbaf002","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Ensemble learning, as a powerful machine learning method, improves overall prediction performance by combining the prediction results of multiple base models. Blending, as a popular ensemble learning method, can train multiple base models, input the resulting prediction results to further train meta model and obtain final prediction results. However, conventional blending divides the training set by simple random sampling, which causes bias and large variance, thus affecting the stability and accuracy of prediction performance. In this study, we propose a new algorithm of stratified sampling blending (ssBlending), which addresses the algorithm instability of conventional blending caused by the random partition of the training set, further improving the prediction accuracy.</p><p><strong>Results: </strong>We used multiple genotype data sets from different species including animal (pig), plant (loblolly pine), and microorganism (yeast) to test the prediction performance of ssBlending. The across-species multi-dataset verification results reveal that ssBlending is superior to conventional blending in terms of prediction accuracy and stability. In addition, we optimized the training set sampling rate (BestH) to facilitate the practical application of the ssBlending algorithm. In summary, this study proposes a completely new algorithm combing stratification strategy with the conventional blending, which provides more options for ensemble learning in various fields.</p><p><strong>Availability and implementation: </strong>https://figshare.com/s/23122a18dc8a35f12ff6.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf002"},"PeriodicalIF":2.4000,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11908643/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: Ensemble learning, as a powerful machine learning method, improves overall prediction performance by combining the prediction results of multiple base models. Blending, as a popular ensemble learning method, can train multiple base models, input the resulting prediction results to further train meta model and obtain final prediction results. However, conventional blending divides the training set by simple random sampling, which causes bias and large variance, thus affecting the stability and accuracy of prediction performance. In this study, we propose a new algorithm of stratified sampling blending (ssBlending), which addresses the algorithm instability of conventional blending caused by the random partition of the training set, further improving the prediction accuracy.

Results: We used multiple genotype data sets from different species including animal (pig), plant (loblolly pine), and microorganism (yeast) to test the prediction performance of ssBlending. The across-species multi-dataset verification results reveal that ssBlending is superior to conventional blending in terms of prediction accuracy and stability. In addition, we optimized the training set sampling rate (BestH) to facilitate the practical application of the ssBlending algorithm. In summary, this study proposes a completely new algorithm combing stratification strategy with the conventional blending, which provides more options for ensemble learning in various fields.

Availability and implementation: https://figshare.com/s/23122a18dc8a35f12ff6.

一种新的集合学习方法分层抽样混合法优化了传统混合法,提高了预测性能。
动机:集成学习作为一种强大的机器学习方法,通过组合多个基本模型的预测结果来提高整体预测性能。混合作为一种流行的集成学习方法,可以训练多个基本模型,输入得到的预测结果进一步训练元模型,得到最终的预测结果。然而,传统的混合方法通过简单的随机抽样对训练集进行分割,会产生偏差和较大的方差,从而影响预测性能的稳定性和准确性。在本研究中,我们提出了一种新的分层抽样混合算法(ssBlending),解决了传统混合算法由于训练集的随机划分而导致的算法不稳定性,进一步提高了预测精度。结果:我们使用了包括动物(猪)、植物(火炬松)和微生物(酵母)在内的不同物种的多个基因型数据集来测试ssBlending的预测性能。跨物种多数据集验证结果表明,ssBlending在预测精度和稳定性方面优于传统的混合方法。此外,我们优化了训练集采样率(BestH),以方便ssBlending算法的实际应用。综上所述,本研究提出了一种将分层策略与传统混合相结合的全新算法,为各个领域的集成学习提供了更多的选择。可用性和实现:https://figshare.com/s/23122a18dc8a35f12ff6。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.60
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信