Jianjun Zhang, Ting Wang, Wing W. Y. Ng, Shuai Zhang, C. Nugent
{"title":"不平衡问题的欠采样近决策边界","authors":"Jianjun Zhang, Ting Wang, Wing W. Y. Ng, Shuai Zhang, C. Nugent","doi":"10.1109/ICMLC48188.2019.8949290","DOIUrl":null,"url":null,"abstract":"Undersampling the dataset to rebalance the class distribution is effective to handle class imbalance problems. However, randomly removing majority examples via a uniform distribution may lead to unnecessary information loss. This would result in performance deterioration of classifiers trained using this rebalanced dataset. On the other hand, examples have different sensitivities with respect to class imbalance. Higher sensitivity means that this example is more easily to be affected by class imbalance, which can be used to guide the selection of examples to rebalance the class distribution and to boost the classifier performance. Therefore, in this paper, we propose a novel undersampling method, the UnderSampling using Sensitivity (USS), based on sensitivity of each majority example. Examples with low sensitivities are noisy or safe examples while examples with high sensitivities are borderline examples. In USS, majority examples with higher sensitivities are more likely to be selected. Experiments on 20 datasets confirm the superiority of the USS against one baseline method and five resampling methods.","PeriodicalId":221349,"journal":{"name":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Undersampling Near Decision Boundary for Imbalance Problems\",\"authors\":\"Jianjun Zhang, Ting Wang, Wing W. Y. Ng, Shuai Zhang, C. Nugent\",\"doi\":\"10.1109/ICMLC48188.2019.8949290\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Undersampling the dataset to rebalance the class distribution is effective to handle class imbalance problems. However, randomly removing majority examples via a uniform distribution may lead to unnecessary information loss. This would result in performance deterioration of classifiers trained using this rebalanced dataset. On the other hand, examples have different sensitivities with respect to class imbalance. Higher sensitivity means that this example is more easily to be affected by class imbalance, which can be used to guide the selection of examples to rebalance the class distribution and to boost the classifier performance. Therefore, in this paper, we propose a novel undersampling method, the UnderSampling using Sensitivity (USS), based on sensitivity of each majority example. Examples with low sensitivities are noisy or safe examples while examples with high sensitivities are borderline examples. In USS, majority examples with higher sensitivities are more likely to be selected. Experiments on 20 datasets confirm the superiority of the USS against one baseline method and five resampling methods.\",\"PeriodicalId\":221349,\"journal\":{\"name\":\"2019 International Conference on Machine Learning and Cybernetics (ICMLC)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Machine Learning and Cybernetics (ICMLC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLC48188.2019.8949290\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Machine Learning and Cybernetics (ICMLC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC48188.2019.8949290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Undersampling Near Decision Boundary for Imbalance Problems
Undersampling the dataset to rebalance the class distribution is effective to handle class imbalance problems. However, randomly removing majority examples via a uniform distribution may lead to unnecessary information loss. This would result in performance deterioration of classifiers trained using this rebalanced dataset. On the other hand, examples have different sensitivities with respect to class imbalance. Higher sensitivity means that this example is more easily to be affected by class imbalance, which can be used to guide the selection of examples to rebalance the class distribution and to boost the classifier performance. Therefore, in this paper, we propose a novel undersampling method, the UnderSampling using Sensitivity (USS), based on sensitivity of each majority example. Examples with low sensitivities are noisy or safe examples while examples with high sensitivities are borderline examples. In USS, majority examples with higher sensitivities are more likely to be selected. Experiments on 20 datasets confirm the superiority of the USS against one baseline method and five resampling methods.