{"title":"遗传算法在Smote(合成少数派过采样技术)中处理不平衡数据集问题的实现","authors":"Tince Etlin Tallo, Aina Musdholifah","doi":"10.1109/ICSTC.2018.8528591","DOIUrl":null,"url":null,"abstract":"An imbalanced dataset is a condition that has a minority class which is a class has far fewer instance distributions than other classes. The imbalanced condition can affect the performance of standard classifier algorithms that lead to the biased of the results classification or tend to become a majority class. The SMOTE method overcomes the imbalanced masses by creating synthetic instances of minority classes. However, the implementation of SMOTE resulted in overgeneralization because generated instances have the same amount regardless of the distribution of instances. As a result, the boundaries between classes are unclear. The SMOTE-Simple Genetic Algorithm (SMOTE-SGA) method is used to determine the sampling rate of each instance in order to obtain unequal amounts of synthetic instances. The tests were performed using some imbalanced datasets by comparing the classification results measured using G-means and F-Measure. The results of the application of genetic algorithm at SMOTE can improve the classification result by obtaining better G-means and F-measure value.","PeriodicalId":196768,"journal":{"name":"2018 4th International Conference on Science and Technology (ICST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":"{\"title\":\"The Implementation of Genetic Algorithm in Smote (Synthetic Minority Oversampling Technique) for Handling Imbalanced Dataset Problem\",\"authors\":\"Tince Etlin Tallo, Aina Musdholifah\",\"doi\":\"10.1109/ICSTC.2018.8528591\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An imbalanced dataset is a condition that has a minority class which is a class has far fewer instance distributions than other classes. The imbalanced condition can affect the performance of standard classifier algorithms that lead to the biased of the results classification or tend to become a majority class. The SMOTE method overcomes the imbalanced masses by creating synthetic instances of minority classes. However, the implementation of SMOTE resulted in overgeneralization because generated instances have the same amount regardless of the distribution of instances. As a result, the boundaries between classes are unclear. The SMOTE-Simple Genetic Algorithm (SMOTE-SGA) method is used to determine the sampling rate of each instance in order to obtain unequal amounts of synthetic instances. The tests were performed using some imbalanced datasets by comparing the classification results measured using G-means and F-Measure. The results of the application of genetic algorithm at SMOTE can improve the classification result by obtaining better G-means and F-measure value.\",\"PeriodicalId\":196768,\"journal\":{\"name\":\"2018 4th International Conference on Science and Technology (ICST)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"26\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 4th International Conference on Science and Technology (ICST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSTC.2018.8528591\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 4th International Conference on Science and Technology (ICST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSTC.2018.8528591","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The Implementation of Genetic Algorithm in Smote (Synthetic Minority Oversampling Technique) for Handling Imbalanced Dataset Problem
An imbalanced dataset is a condition that has a minority class which is a class has far fewer instance distributions than other classes. The imbalanced condition can affect the performance of standard classifier algorithms that lead to the biased of the results classification or tend to become a majority class. The SMOTE method overcomes the imbalanced masses by creating synthetic instances of minority classes. However, the implementation of SMOTE resulted in overgeneralization because generated instances have the same amount regardless of the distribution of instances. As a result, the boundaries between classes are unclear. The SMOTE-Simple Genetic Algorithm (SMOTE-SGA) method is used to determine the sampling rate of each instance in order to obtain unequal amounts of synthetic instances. The tests were performed using some imbalanced datasets by comparing the classification results measured using G-means and F-Measure. The results of the application of genetic algorithm at SMOTE can improve the classification result by obtaining better G-means and F-measure value.