用于不平衡数据分类的Lazy Bagging

Xingquan Zhu
{"title":"用于不平衡数据分类的Lazy Bagging","authors":"Xingquan Zhu","doi":"10.1109/ICDM.2007.95","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a lazy bagging (LB) design, which builds bootstrap replicate bags based on the characteristics of the test instances. Upon receiving a test instance Ik, LB will trim bootstrap bags by taking Ik's nearest neighbors in the training set into consideration. Our hypothesis is that an unlabeled instance's nearest neighbors provide valuable information for learners to refine their local decision boundaries for classifying this instance. By taking full advantage of Ik's nearest neighbors, the base learners are able to receive less bias and variance in classifying Ik. This strategy is beneficial for classifying imbalanced data because refining local decision boundaries can help a learner reduce its inherent bias towards the majority class and improve its performance on minority class examples. Our experimental results will confirm that LB outperforms C4.5 and TB in terms of reducing classification error, and most importantly this error reduction is largely contributed from LB's improvement on minority class examples.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":"{\"title\":\"Lazy Bagging for Classifying Imbalanced Data\",\"authors\":\"Xingquan Zhu\",\"doi\":\"10.1109/ICDM.2007.95\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a lazy bagging (LB) design, which builds bootstrap replicate bags based on the characteristics of the test instances. Upon receiving a test instance Ik, LB will trim bootstrap bags by taking Ik's nearest neighbors in the training set into consideration. Our hypothesis is that an unlabeled instance's nearest neighbors provide valuable information for learners to refine their local decision boundaries for classifying this instance. By taking full advantage of Ik's nearest neighbors, the base learners are able to receive less bias and variance in classifying Ik. This strategy is beneficial for classifying imbalanced data because refining local decision boundaries can help a learner reduce its inherent bias towards the majority class and improve its performance on minority class examples. Our experimental results will confirm that LB outperforms C4.5 and TB in terms of reducing classification error, and most importantly this error reduction is largely contributed from LB's improvement on minority class examples.\",\"PeriodicalId\":233758,\"journal\":{\"name\":\"Seventh IEEE International Conference on Data Mining (ICDM 2007)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"31\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Seventh IEEE International Conference on Data Mining (ICDM 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2007.95\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2007.95","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 31

摘要

在本文中,我们提出了一种lazy bagging (LB)设计,该设计基于测试实例的特征构建自引导复制袋。在接收到测试实例Ik后,LB将通过考虑Ik在训练集中的最近邻居来修剪引导包。我们的假设是,未标记实例的最近邻居为学习器提供了有价值的信息,以改进其局部决策边界,从而对该实例进行分类。通过充分利用Ik的近邻,基础学习器在分类Ik时能够得到更小的偏差和方差。该策略有利于对不平衡数据进行分类,因为精炼局部决策边界可以帮助学习者减少对多数类的固有偏见,并提高其对少数类示例的性能。我们的实验结果将证实LB在减少分类错误方面优于C4.5和TB,最重要的是,这种减少错误很大程度上得益于LB对少数类示例的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Lazy Bagging for Classifying Imbalanced Data
In this paper, we propose a lazy bagging (LB) design, which builds bootstrap replicate bags based on the characteristics of the test instances. Upon receiving a test instance Ik, LB will trim bootstrap bags by taking Ik's nearest neighbors in the training set into consideration. Our hypothesis is that an unlabeled instance's nearest neighbors provide valuable information for learners to refine their local decision boundaries for classifying this instance. By taking full advantage of Ik's nearest neighbors, the base learners are able to receive less bias and variance in classifying Ik. This strategy is beneficial for classifying imbalanced data because refining local decision boundaries can help a learner reduce its inherent bias towards the majority class and improve its performance on minority class examples. Our experimental results will confirm that LB outperforms C4.5 and TB in terms of reducing classification error, and most importantly this error reduction is largely contributed from LB's improvement on minority class examples.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信