{"title":"Dynamic Ensemble Framework for Imbalanced Data Classification","authors":"Tuanfei Zhu;Xingchen Hu;Xinwang Liu;En Zhu;Xinzhong Zhu;Huiying Xu","doi":"10.1109/TKDE.2025.3528719","DOIUrl":null,"url":null,"abstract":"Dynamic ensemble has significantly greater potential space to improve the classification of imbalanced data compared to static ensemble. However, dynamic ensemble schemes are far less successful than static ensemble methods in the imbalanced learning field. Through an in-depth analysis on the behavior characteristics of dynamic ensemble, we find that there are some important problems that need to be addressed to release the full potential of dynamic ensemble, including but not limited to, correcting the component classifiers’ bias towards the majority classes, increasing the proportions of the positive classifiers (i.e., the component classifiers making correct prediction) for difficult samples, and providing the accurate competence estimations on the hard-to-classify samples w.r.t the classifier pool. Inspired by these, we propose a Dynamic Ensemble Framework for imbalanced data classification (imDEF). imDEF first uses the data generation method OREM<inline-formula><tex-math>$\\mathrm{_{G}}$</tex-math></inline-formula> to generate multiple artificial synthetic datasets, which have diverse class distributions by rebalancing the original imbalanced data. Based on each of such synthetic datasets, imDEF then utilizes a Classification Error-aware Self-Paced Sampling Ensemble (SPSE<inline-formula><tex-math>$\\mathrm{_{CE}}$</tex-math></inline-formula>) method to gradually focus more on difficult samples, to create a low-biased classifier pool and increase the proportions of the positive classifiers for the difficult samples. Finally, imDEF constructs a referee system to achieve the competence estimations by leveraging an Ensemble Margin-aware Self-Paced Sampling Ensemble (SPSE<inline-formula><tex-math>$\\mathrm{_{EM}}$</tex-math></inline-formula>) method. SPSE<inline-formula><tex-math>$\\mathrm{_{EM}}$</tex-math></inline-formula> incrementally strengthens the learning of the hard-to-classify samples, so that the competent levels of component classifiers could be estimated accurately. Extensive experiments demonstrate the effectiveness of imDEF. The source codes have been made publicly available on GitHub.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 5","pages":"2456-2471"},"PeriodicalIF":8.9000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10839625/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Dynamic ensemble has significantly greater potential space to improve the classification of imbalanced data compared to static ensemble. However, dynamic ensemble schemes are far less successful than static ensemble methods in the imbalanced learning field. Through an in-depth analysis on the behavior characteristics of dynamic ensemble, we find that there are some important problems that need to be addressed to release the full potential of dynamic ensemble, including but not limited to, correcting the component classifiers’ bias towards the majority classes, increasing the proportions of the positive classifiers (i.e., the component classifiers making correct prediction) for difficult samples, and providing the accurate competence estimations on the hard-to-classify samples w.r.t the classifier pool. Inspired by these, we propose a Dynamic Ensemble Framework for imbalanced data classification (imDEF). imDEF first uses the data generation method OREM$\mathrm{_{G}}$ to generate multiple artificial synthetic datasets, which have diverse class distributions by rebalancing the original imbalanced data. Based on each of such synthetic datasets, imDEF then utilizes a Classification Error-aware Self-Paced Sampling Ensemble (SPSE$\mathrm{_{CE}}$) method to gradually focus more on difficult samples, to create a low-biased classifier pool and increase the proportions of the positive classifiers for the difficult samples. Finally, imDEF constructs a referee system to achieve the competence estimations by leveraging an Ensemble Margin-aware Self-Paced Sampling Ensemble (SPSE$\mathrm{_{EM}}$) method. SPSE$\mathrm{_{EM}}$ incrementally strengthens the learning of the hard-to-classify samples, so that the competent levels of component classifiers could be estimated accurately. Extensive experiments demonstrate the effectiveness of imDEF. The source codes have been made publicly available on GitHub.
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.