Boosting diversity in regression ensembles

IF 3.6 4区数学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Statistical Analysis and Data Mining Pub Date : 2023-12-30 DOI:10.1002/sam.11654

Mathias Bourel, Jairo Cugliari, Yannig Goude, Jean-Michel Poggi

{"title":"Boosting diversity in regression ensembles","authors":"Mathias Bourel, Jairo Cugliari, Yannig Goude, Jean-Michel Poggi","doi":"10.1002/sam.11654","DOIUrl":null,"url":null,"abstract":"Ensemble methods, such as Bagging, Boosting, or Random Forests, often enhance the prediction performance of single learners on both classification and regression tasks. In the context of regression, we propose a gradient boosting-based algorithm incorporating a diversity term with the aim of constructing different learners that enrich the ensemble while achieving a trade-off of some individual optimality for global enhancement. Verifying the hypotheses of Biau and Cadre's theorem (2021, <i>Advances in contemporary statistics and econometrics—Festschrift in honour of Christine Thomas-Agnan</i>, Springer), we present a convergence result ensuring that the associated optimization strategy reaches the global optimum. In the experiments, we consider a variety of different base learners with increasing complexity: stumps, regression trees, Purely Random Forests, and Breiman's Random Forests. Finally, we consider simulated and benchmark datasets and a real-world electricity demand dataset to show, by means of numerical experiments, the suitability of our procedure by examining the behavior not only of the final or the aggregated predictor but also of the whole generated sequence.","PeriodicalId":48684,"journal":{"name":"Statistical Analysis and Data Mining","volume":"33 1","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1002/sam.11654","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Ensemble methods, such as Bagging, Boosting, or Random Forests, often enhance the prediction performance of single learners on both classification and regression tasks. In the context of regression, we propose a gradient boosting-based algorithm incorporating a diversity term with the aim of constructing different learners that enrich the ensemble while achieving a trade-off of some individual optimality for global enhancement. Verifying the hypotheses of Biau and Cadre's theorem (2021, Advances in contemporary statistics and econometrics—Festschrift in honour of Christine Thomas-Agnan, Springer), we present a convergence result ensuring that the associated optimization strategy reaches the global optimum. In the experiments, we consider a variety of different base learners with increasing complexity: stumps, regression trees, Purely Random Forests, and Breiman's Random Forests. Finally, we consider simulated and benchmark datasets and a real-world electricity demand dataset to show, by means of numerical experiments, the suitability of our procedure by examining the behavior not only of the final or the aggregated predictor but also of the whole generated sequence.

查看原文本刊更多论文

提升回归集合的多样性

在分类和回归任务中，集合方法（如 Bagging、Boosting 或 Random Forests）通常能提高单个学习者的预测性能。在回归方面，我们提出了一种基于梯度提升的算法，该算法包含一个多样性项，目的是构建不同的学习器，丰富集合，同时在某些个体最优性与全局增强性之间实现权衡。通过验证 Biau 和 Cadre 定理（2021 年，《当代统计学和计量经济学进展--克里斯蒂娜-托马斯-阿格南纪念文集》，施普林格出版社）的假设，我们提出了一个收敛结果，确保相关优化策略达到全局最优。在实验中，我们考虑了各种不同的基础学习器，其复杂度也在不断增加：树桩、回归树、纯随机森林和布雷曼随机森林。最后，我们考虑了模拟数据集、基准数据集和一个真实世界的电力需求数据集，通过数值实验，不仅检查最终预测器或聚合预测器的行为，还检查整个生成序列的行为，从而展示我们的程序的适用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Statistical Analysis and Data Mining COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

CiteScore

3.20

自引率

7.70%

发文量

期刊介绍： Statistical Analysis and Data Mining addresses the broad area of data analysis, including statistical approaches, machine learning, data mining, and applications. Topics include statistical and computational approaches for analyzing massive and complex datasets, novel statistical and/or machine learning methods and theory, and state-of-the-art applications with high impact. Of special interest are articles that describe innovative analytical techniques, and discuss their application to real problems, in such a way that they are accessible and beneficial to domain experts across science, engineering, and commerce. The focus of the journal is on papers which satisfy one or more of the following criteria: Solve data analysis problems associated with massive, complex datasets Develop innovative statistical approaches, machine learning algorithms, or methods integrating ideas across disciplines, e.g., statistics, computer science, electrical engineering, operation research. Formulate and solve high-impact real-world problems which challenge existing paradigms via new statistical and/or computational models Provide survey to prominent research topics.