C. C. Ceballes-Serrano, S. García-López, J. A. Jaramillo-Garzón, G. Castellanos-Domínguez
{"title":"一种基于粒子群优化的不平衡数据集分类策略","authors":"C. C. Ceballes-Serrano, S. García-López, J. A. Jaramillo-Garzón, G. Castellanos-Domínguez","doi":"10.1109/STSIVA.2012.6340585","DOIUrl":null,"url":null,"abstract":"Learning from imbalanced data has taken great interest on machine learning community because it is often present on many practical applications and reliability of learning algorithms is affected. A dataset is imbalanced if there is a great difference between observations from each class. Classification methods that do not consider this phenomenon are prone to produce decision boundaries totally biased towards the majority class. Today, assembly methods like DataBoost-IM combine sampling strategies with Boosting, and oversampling methods. However, when the input data has much noise these algorithms tend to reduce their performances. This work present a new method to deal with imbalanced data called SwarmBoost that combines Bossting, oversampling, and sub sampling based in optimization criteria to select samples. The results show that SwarmBoost has a better performance than DataBoost-IM and Smote for several databases.","PeriodicalId":383297,"journal":{"name":"2012 XVII Symposium of Image, Signal Processing, and Artificial Vision (STSIVA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A strategy for classifying imbalanced data sets based on particle swarm optimization\",\"authors\":\"C. C. Ceballes-Serrano, S. García-López, J. A. Jaramillo-Garzón, G. Castellanos-Domínguez\",\"doi\":\"10.1109/STSIVA.2012.6340585\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Learning from imbalanced data has taken great interest on machine learning community because it is often present on many practical applications and reliability of learning algorithms is affected. A dataset is imbalanced if there is a great difference between observations from each class. Classification methods that do not consider this phenomenon are prone to produce decision boundaries totally biased towards the majority class. Today, assembly methods like DataBoost-IM combine sampling strategies with Boosting, and oversampling methods. However, when the input data has much noise these algorithms tend to reduce their performances. This work present a new method to deal with imbalanced data called SwarmBoost that combines Bossting, oversampling, and sub sampling based in optimization criteria to select samples. The results show that SwarmBoost has a better performance than DataBoost-IM and Smote for several databases.\",\"PeriodicalId\":383297,\"journal\":{\"name\":\"2012 XVII Symposium of Image, Signal Processing, and Artificial Vision (STSIVA)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 XVII Symposium of Image, Signal Processing, and Artificial Vision (STSIVA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/STSIVA.2012.6340585\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 XVII Symposium of Image, Signal Processing, and Artificial Vision (STSIVA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/STSIVA.2012.6340585","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A strategy for classifying imbalanced data sets based on particle swarm optimization
Learning from imbalanced data has taken great interest on machine learning community because it is often present on many practical applications and reliability of learning algorithms is affected. A dataset is imbalanced if there is a great difference between observations from each class. Classification methods that do not consider this phenomenon are prone to produce decision boundaries totally biased towards the majority class. Today, assembly methods like DataBoost-IM combine sampling strategies with Boosting, and oversampling methods. However, when the input data has much noise these algorithms tend to reduce their performances. This work present a new method to deal with imbalanced data called SwarmBoost that combines Bossting, oversampling, and sub sampling based in optimization criteria to select samples. The results show that SwarmBoost has a better performance than DataBoost-IM and Smote for several databases.