{"title":"不平衡数据分类的临界过采样","authors":"Hien M. Nguyen, E. Cooper, K. Kamei","doi":"10.1504/IJKESDP.2011.039875","DOIUrl":null,"url":null,"abstract":"Traditional classification algorithms usually provide poor accuracy on the prediction of the minority class of imbalanced data sets. This paper proposes a new method for dealing with imbalanced data sets by over-sampling the borderline minority class instances. A Support Vector Machine (SVM) classifier is then trained to predict future instances. Compared with other over-sampling methods, the proposed method focuses only on the minority class instances residing along the decision boundary, due to the fact that this region is the most crucial for establishing the decision boundary. Furthermore, the artificial minority instances are generated in such a way that the regions of the minority class with fewer majority class instances would be expanded by extrapolation, otherwise the current boundary of the minority class would be consolidated by interpolation. Experimental results show that the proposed method achieves a better performance than other over-sampling methods.","PeriodicalId":347123,"journal":{"name":"Int. J. Knowl. Eng. Soft Data Paradigms","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"344","resultStr":"{\"title\":\"Borderline over-sampling for imbalanced data classification\",\"authors\":\"Hien M. Nguyen, E. Cooper, K. Kamei\",\"doi\":\"10.1504/IJKESDP.2011.039875\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Traditional classification algorithms usually provide poor accuracy on the prediction of the minority class of imbalanced data sets. This paper proposes a new method for dealing with imbalanced data sets by over-sampling the borderline minority class instances. A Support Vector Machine (SVM) classifier is then trained to predict future instances. Compared with other over-sampling methods, the proposed method focuses only on the minority class instances residing along the decision boundary, due to the fact that this region is the most crucial for establishing the decision boundary. Furthermore, the artificial minority instances are generated in such a way that the regions of the minority class with fewer majority class instances would be expanded by extrapolation, otherwise the current boundary of the minority class would be consolidated by interpolation. Experimental results show that the proposed method achieves a better performance than other over-sampling methods.\",\"PeriodicalId\":347123,\"journal\":{\"name\":\"Int. J. Knowl. Eng. Soft Data Paradigms\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"344\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Knowl. Eng. Soft Data Paradigms\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/IJKESDP.2011.039875\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Knowl. Eng. Soft Data Paradigms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJKESDP.2011.039875","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Borderline over-sampling for imbalanced data classification
Traditional classification algorithms usually provide poor accuracy on the prediction of the minority class of imbalanced data sets. This paper proposes a new method for dealing with imbalanced data sets by over-sampling the borderline minority class instances. A Support Vector Machine (SVM) classifier is then trained to predict future instances. Compared with other over-sampling methods, the proposed method focuses only on the minority class instances residing along the decision boundary, due to the fact that this region is the most crucial for establishing the decision boundary. Furthermore, the artificial minority instances are generated in such a way that the regions of the minority class with fewer majority class instances would be expanded by extrapolation, otherwise the current boundary of the minority class would be consolidated by interpolation. Experimental results show that the proposed method achieves a better performance than other over-sampling methods.