{"title":"LOF-enhanced SMOTE algorithm for imbalanced dataset","authors":"Zhuangzhuang Zhang, Jing Hu, Tiecheng Song","doi":"10.1117/12.2685807","DOIUrl":null,"url":null,"abstract":"This paper proposes a new algorithm, LOF-Enhanced SMOTE, aimed at addressing the problem of imbalanced datasets in machine learning tasks. Due to the significantly fewer samples of certain classes in imbalanced datasets, the performance of classifiers may be negatively affected. To solve this problem, we introduce the Local Outlier Factor (LOF) algorithm to remove boundary noise on the basis of the SMOTE algorithm, and use a Gaussian kernel function to consider the similarity of generated samples. We conduct experiments on real intrusion detection data, UNSW-NB15. The results show that LOF-Enhanced SMOTE outperforms SMOTE and Borderline-SMOTE algorithms overall, and significantly outperforms them in detecting certain minority classes. This indicates that the LOF-Enhanced SMOTE algorithm can effectively solve the classification problem of imbalanced datasets.","PeriodicalId":305812,"journal":{"name":"International Conference on Electronic Information Technology","volume":"12719 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Electronic Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2685807","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper proposes a new algorithm, LOF-Enhanced SMOTE, aimed at addressing the problem of imbalanced datasets in machine learning tasks. Due to the significantly fewer samples of certain classes in imbalanced datasets, the performance of classifiers may be negatively affected. To solve this problem, we introduce the Local Outlier Factor (LOF) algorithm to remove boundary noise on the basis of the SMOTE algorithm, and use a Gaussian kernel function to consider the similarity of generated samples. We conduct experiments on real intrusion detection data, UNSW-NB15. The results show that LOF-Enhanced SMOTE outperforms SMOTE and Borderline-SMOTE algorithms overall, and significantly outperforms them in detecting certain minority classes. This indicates that the LOF-Enhanced SMOTE algorithm can effectively solve the classification problem of imbalanced datasets.