Mixed Bagging: A Novel Ensemble Learning Framework for Supervised Classification Based on Instance Hardness

2018 IEEE International Conference on Data Mining (ICDM) Pub Date : 2018-11-01 DOI:10.1109/ICDM.2018.00137

A. Kabir, Carolina Ruiz, S. A. Alvarez

{"title":"Mixed Bagging: A Novel Ensemble Learning Framework for Supervised Classification Based on Instance Hardness","authors":"A. Kabir, Carolina Ruiz, S. A. Alvarez","doi":"10.1109/ICDM.2018.00137","DOIUrl":null,"url":null,"abstract":"We introduce a novel ensemble learning framework for supervised classification. Our proposed framework, mixed bagging, is a form of bootstrap aggregating (bagging) in which the sampling process takes into account the classification hardness of the training instances. The classification hardness, or simply hardness, of an instance is defined as the probability that the instance will be misclassified by a classification model built from the remaining instances in the training set. We incorporate instance hardness into the bagging process by varying the sampling probability of each instance based on its estimated hardness. Bootstraps of differing hardness can be created in this way by over-representing, under-representing and equally representing harder instances. This results in a diverse committee of classifiers induced from the bootstraps, whose individual outputs can be aggregated to achieve a final class prediction. We propose two versions of mixed bagging – one where the bootstraps are grouped as easy, regular or hard, with all bootstraps in one group having the same hardness; and the other where the hardness of bootstraps change gradually from one iteration to the next. We have tested our system on 47 publicly available binary classification problems using C4.5 Decision Trees of varying depth as base learners. We find that the proposed mixed bagging methods perform better than traditional bagging and weighted bagging (wagging) regardless of the base learner. The proposed method also outperforms AdaBoost when the base learner consists of deeper decision trees. We examine the results of mixed bagging in terms of bias-variance decomposition and find that mixed bagging is better than AdaBoost at reducing variance and better than traditional bagging at reducing inductive bias.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2018.00137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

We introduce a novel ensemble learning framework for supervised classification. Our proposed framework, mixed bagging, is a form of bootstrap aggregating (bagging) in which the sampling process takes into account the classification hardness of the training instances. The classification hardness, or simply hardness, of an instance is defined as the probability that the instance will be misclassified by a classification model built from the remaining instances in the training set. We incorporate instance hardness into the bagging process by varying the sampling probability of each instance based on its estimated hardness. Bootstraps of differing hardness can be created in this way by over-representing, under-representing and equally representing harder instances. This results in a diverse committee of classifiers induced from the bootstraps, whose individual outputs can be aggregated to achieve a final class prediction. We propose two versions of mixed bagging – one where the bootstraps are grouped as easy, regular or hard, with all bootstraps in one group having the same hardness; and the other where the hardness of bootstraps change gradually from one iteration to the next. We have tested our system on 47 publicly available binary classification problems using C4.5 Decision Trees of varying depth as base learners. We find that the proposed mixed bagging methods perform better than traditional bagging and weighted bagging (wagging) regardless of the base learner. The proposed method also outperforms AdaBoost when the base learner consists of deeper decision trees. We examine the results of mixed bagging in terms of bias-variance decomposition and find that mixed bagging is better than AdaBoost at reducing variance and better than traditional bagging at reducing inductive bias.

查看原文本刊更多论文

混合Bagging:一种基于实例硬度的监督分类集成学习框架

提出了一种新的监督分类集成学习框架。我们提出的框架，混合装袋，是一种自举聚合(装袋)形式，其中采样过程考虑了训练实例的分类硬度。一个实例的分类硬度，或者简单的硬度，被定义为该实例被基于训练集中剩余实例构建的分类模型错误分类的概率。我们将实例硬度纳入装袋过程，通过根据其估计硬度改变每个实例的抽样概率。不同硬度的引导可以通过过度表示、不足表示和平均表示更困难的实例来创建。这导致从自举中归纳出一个不同的分类器委员会，其单独的输出可以被聚合以实现最终的类别预测。我们提出了两种版本的混合装袋——一种是将引导分组为容易、规则或困难，所有引导在一组中具有相同的硬度;另一种情况是，从一个迭代到下一个迭代，自举的硬度逐渐改变。我们使用不同深度的C4.5决策树作为基础学习器，在47个公开的二元分类问题上测试了我们的系统。我们发现，无论基础学习器是什么，所提出的混合装袋方法都比传统装袋和加权装袋(摇袋)表现得更好。当基础学习器由更深层次的决策树组成时，所提出的方法也优于AdaBoost。我们从偏差-方差分解的角度检验了混合套袋的结果，发现混合套袋在减少方差方面优于AdaBoost，在减少归纳偏差方面优于传统套袋。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE International Conference on Data Mining (ICDM)

自引率

0.00%

发文量