Lazy Bagging for Classifying Imbalanced Data

Seventh IEEE International Conference on Data Mining (ICDM 2007) Pub Date : 2007-10-28 DOI:10.1109/ICDM.2007.95

Xingquan Zhu

引用次数: 31

Abstract

In this paper, we propose a lazy bagging (LB) design, which builds bootstrap replicate bags based on the characteristics of the test instances. Upon receiving a test instance Ik, LB will trim bootstrap bags by taking Ik's nearest neighbors in the training set into consideration. Our hypothesis is that an unlabeled instance's nearest neighbors provide valuable information for learners to refine their local decision boundaries for classifying this instance. By taking full advantage of Ik's nearest neighbors, the base learners are able to receive less bias and variance in classifying Ik. This strategy is beneficial for classifying imbalanced data because refining local decision boundaries can help a learner reduce its inherent bias towards the majority class and improve its performance on minority class examples. Our experimental results will confirm that LB outperforms C4.5 and TB in terms of reducing classification error, and most importantly this error reduction is largely contributed from LB's improvement on minority class examples.

查看原文本刊更多论文

用于不平衡数据分类的Lazy Bagging

在本文中，我们提出了一种lazy bagging (LB)设计，该设计基于测试实例的特征构建自引导复制袋。在接收到测试实例Ik后，LB将通过考虑Ik在训练集中的最近邻居来修剪引导包。我们的假设是，未标记实例的最近邻居为学习器提供了有价值的信息，以改进其局部决策边界，从而对该实例进行分类。通过充分利用Ik的近邻，基础学习器在分类Ik时能够得到更小的偏差和方差。该策略有利于对不平衡数据进行分类，因为精炼局部决策边界可以帮助学习者减少对多数类的固有偏见，并提高其对少数类示例的性能。我们的实验结果将证实LB在减少分类错误方面优于C4.5和TB，最重要的是，这种减少错误很大程度上得益于LB对少数类示例的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Seventh IEEE International Conference on Data Mining (ICDM 2007)

自引率

0.00%

发文量