不平衡数据分类的临界过采样

Int. J. Knowl. Eng. Soft Data Paradigms Pub Date : 2009-11-10 DOI:10.1504/IJKESDP.2011.039875

Hien M. Nguyen, E. Cooper, K. Kamei

{"title":"不平衡数据分类的临界过采样","authors":"Hien M. Nguyen, E. Cooper, K. Kamei","doi":"10.1504/IJKESDP.2011.039875","DOIUrl":null,"url":null,"abstract":"Traditional classification algorithms usually provide poor accuracy on the prediction of the minority class of imbalanced data sets. This paper proposes a new method for dealing with imbalanced data sets by over-sampling the borderline minority class instances. A Support Vector Machine (SVM) classifier is then trained to predict future instances. Compared with other over-sampling methods, the proposed method focuses only on the minority class instances residing along the decision boundary, due to the fact that this region is the most crucial for establishing the decision boundary. Furthermore, the artificial minority instances are generated in such a way that the regions of the minority class with fewer majority class instances would be expanded by extrapolation, otherwise the current boundary of the minority class would be consolidated by interpolation. Experimental results show that the proposed method achieves a better performance than other over-sampling methods.","PeriodicalId":347123,"journal":{"name":"Int. J. Knowl. Eng. Soft Data Paradigms","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"344","resultStr":"{\"title\":\"Borderline over-sampling for imbalanced data classification\",\"authors\":\"Hien M. Nguyen, E. Cooper, K. Kamei\",\"doi\":\"10.1504/IJKESDP.2011.039875\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Traditional classification algorithms usually provide poor accuracy on the prediction of the minority class of imbalanced data sets. This paper proposes a new method for dealing with imbalanced data sets by over-sampling the borderline minority class instances. A Support Vector Machine (SVM) classifier is then trained to predict future instances. Compared with other over-sampling methods, the proposed method focuses only on the minority class instances residing along the decision boundary, due to the fact that this region is the most crucial for establishing the decision boundary. Furthermore, the artificial minority instances are generated in such a way that the regions of the minority class with fewer majority class instances would be expanded by extrapolation, otherwise the current boundary of the minority class would be consolidated by interpolation. Experimental results show that the proposed method achieves a better performance than other over-sampling methods.\",\"PeriodicalId\":347123,\"journal\":{\"name\":\"Int. J. Knowl. Eng. Soft Data Paradigms\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"344\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Knowl. Eng. Soft Data Paradigms\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/IJKESDP.2011.039875\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Knowl. Eng. Soft Data Paradigms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJKESDP.2011.039875","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 344

摘要

传统的分类算法对少数类不平衡数据集的预测精度较差。本文提出了一种处理不平衡数据集的新方法，即对边缘少数类实例进行过采样。然后训练支持向量机(SVM)分类器来预测未来的实例。与其他过采样方法相比，该方法只关注决策边界附近的少数类实例，因为该区域是建立决策边界的最关键区域。此外，通过外推法将多数类实例较少的少数类区域扩大，否则将通过内插法巩固少数类的当前边界，从而生成人工少数类实例。实验结果表明，该方法比其他过采样方法具有更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Borderline over-sampling for imbalanced data classification

Traditional classification algorithms usually provide poor accuracy on the prediction of the minority class of imbalanced data sets. This paper proposes a new method for dealing with imbalanced data sets by over-sampling the borderline minority class instances. A Support Vector Machine (SVM) classifier is then trained to predict future instances. Compared with other over-sampling methods, the proposed method focuses only on the minority class instances residing along the decision boundary, due to the fact that this region is the most crucial for establishing the decision boundary. Furthermore, the artificial minority instances are generated in such a way that the regions of the minority class with fewer majority class instances would be expanded by extrapolation, otherwise the current boundary of the minority class would be consolidated by interpolation. Experimental results show that the proposed method achieves a better performance than other over-sampling methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Int. J. Knowl. Eng. Soft Data Paradigms

自引率

0.00%

发文量