MDGP-forest: A novel deep forest for multi-class imbalanced learning based on multi-class disassembly and feature construction enhanced by genetic programming

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Zhikai Lin , Yong Xu , Kunhong Liu , Liyan Chen
{"title":"MDGP-forest: A novel deep forest for multi-class imbalanced learning based on multi-class disassembly and feature construction enhanced by genetic programming","authors":"Zhikai Lin ,&nbsp;Yong Xu ,&nbsp;Kunhong Liu ,&nbsp;Liyan Chen","doi":"10.1016/j.patcog.2025.112070","DOIUrl":null,"url":null,"abstract":"<div><div>Class imbalance is a significant challenge in the field of machine learning. Due to factors such as quantity differences and feature overlap among classes, the imbalance problem for multiclass classification is more difficult than that for binary one, which leads to the existing research primarily focusing on the binary classification scenario. This study proposes a novel deep forest algorithm with the aid of Genetic Programming (GP), MDGP-Forest, for the multiclass imbalance problem. MDGP-Forest utilizes Multi-class Disassembly and undersampling based on instance hardness between layers to obtain multiple binary classification datasets, each corresponding to a GP population for feature construction. The improved fitness function of GP assesses the incremental importance of the constructed features for enhanced vectors, introducing higher-order information into subsequent layers to improve predicted performance. Each GP population generates a set of new features that improve the separability of classes, empowering MDGP-Forest with the capability to address the challenge of overlapping features among multiple classes. We thoroughly evaluate the classification performance of MDGP-Forest on 35 datasets. The experimental results demonstrate that MDGP-Forest significantly outperforms existing methods in addressing multiclass imbalance problems, exhibiting superior predictive performance.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112070"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325007307","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Class imbalance is a significant challenge in the field of machine learning. Due to factors such as quantity differences and feature overlap among classes, the imbalance problem for multiclass classification is more difficult than that for binary one, which leads to the existing research primarily focusing on the binary classification scenario. This study proposes a novel deep forest algorithm with the aid of Genetic Programming (GP), MDGP-Forest, for the multiclass imbalance problem. MDGP-Forest utilizes Multi-class Disassembly and undersampling based on instance hardness between layers to obtain multiple binary classification datasets, each corresponding to a GP population for feature construction. The improved fitness function of GP assesses the incremental importance of the constructed features for enhanced vectors, introducing higher-order information into subsequent layers to improve predicted performance. Each GP population generates a set of new features that improve the separability of classes, empowering MDGP-Forest with the capability to address the challenge of overlapping features among multiple classes. We thoroughly evaluate the classification performance of MDGP-Forest on 35 datasets. The experimental results demonstrate that MDGP-Forest significantly outperforms existing methods in addressing multiclass imbalance problems, exhibiting superior predictive performance.
MDGP-forest:一种基于多类分解和遗传规划增强特征构建的新型多类不平衡学习深度森林
类不平衡是机器学习领域的一个重大挑战。由于类之间存在数量差异和特征重叠等因素,多类分类的不平衡问题比二类分类的不平衡问题更加困难,因此现有的研究主要集中在二类分类场景。针对多类不平衡问题,提出了一种基于遗传规划的深度森林算法MDGP-Forest。MDGP-Forest利用多层分解(Multi-class Disassembly)和基于层间实例硬度的欠采样(undersampling)获得多个二值分类数据集,每个数据集对应一个GP种群进行特征构建。改进的GP适应度函数评估构造特征对增强向量的增量重要性,将高阶信息引入后续层以提高预测性能。每个GP填充都会生成一组新特性,这些特性可以改善类的可分离性,从而使MDGP-Forest能够解决多个类之间重叠特性的挑战。我们全面评估了MDGP-Forest在35个数据集上的分类性能。实验结果表明,MDGP-Forest在解决多类不平衡问题方面显著优于现有方法,具有优越的预测性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Pattern Recognition
Pattern Recognition 工程技术-工程:电子与电气
CiteScore
14.40
自引率
16.20%
发文量
683
审稿时长
5.6 months
期刊介绍: The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信