{"title":"Anomaly-based intrusion detection system based on SMOTE-IPF, Whale Optimization Algorithm, and ensemble learning","authors":"Tibebu Bekele Shana , Neetu Kumari , Mayank Agarwal , Samrat Mondal , Upaka Rathnayake","doi":"10.1016/j.iswa.2025.200543","DOIUrl":null,"url":null,"abstract":"<div><div>Nowadays, cybersecurity is a major worldwide problem. Intrusion detection systems (IDS) help guarantee network security by detecting malicious entries from legitimate entries in network traffic data. IDS has considerable potential for detecting dynamic cyber threats, identifying abnormalities, and identifying malicious conduct within the network. In this paper, we propose Machine Learning (ML) models with an emphasis on the Synthetic Minority Over-sampling Technique (SMOTE) with Iterative Partitioning Filter (IPF) for class imbalance and the Whale Optimization Algorithm (WOA) for feature selection. Class imbalance often results in poorly constructed ML models prioritizing the majority class. In addition, the absence of feature selection can lead to higher computational complexity without impacting performance accuracy. This study uses Bagging, AdaBoost, Extreme Gradient Boosting (XGBoost) and Extra Trees Classifier as classification models. The two widely used datasets to assess the proposed method are NLS-KDD and UNSW-NB15. The K-Fold cross-validation technique trains this model to minimize potential overfitting. These models are evaluated based on performance metrics such as accuracy, precision, recall, and F1-score. The experimental results demonstrate that the Extra Trees Classifier significantly outperforms the baseline models and achieves accuracy values of 99.9% for the NSL-KDD dataset and 97% for the UNSW-NB 15 dataset and outperforms all evaluation measures compared to baseline models for multi-classification of the IDS.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"27 ","pages":"Article 200543"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Systems with Applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667305325000699","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Nowadays, cybersecurity is a major worldwide problem. Intrusion detection systems (IDS) help guarantee network security by detecting malicious entries from legitimate entries in network traffic data. IDS has considerable potential for detecting dynamic cyber threats, identifying abnormalities, and identifying malicious conduct within the network. In this paper, we propose Machine Learning (ML) models with an emphasis on the Synthetic Minority Over-sampling Technique (SMOTE) with Iterative Partitioning Filter (IPF) for class imbalance and the Whale Optimization Algorithm (WOA) for feature selection. Class imbalance often results in poorly constructed ML models prioritizing the majority class. In addition, the absence of feature selection can lead to higher computational complexity without impacting performance accuracy. This study uses Bagging, AdaBoost, Extreme Gradient Boosting (XGBoost) and Extra Trees Classifier as classification models. The two widely used datasets to assess the proposed method are NLS-KDD and UNSW-NB15. The K-Fold cross-validation technique trains this model to minimize potential overfitting. These models are evaluated based on performance metrics such as accuracy, precision, recall, and F1-score. The experimental results demonstrate that the Extra Trees Classifier significantly outperforms the baseline models and achieves accuracy values of 99.9% for the NSL-KDD dataset and 97% for the UNSW-NB 15 dataset and outperforms all evaluation measures compared to baseline models for multi-classification of the IDS.