Anomaly-based intrusion detection system based on SMOTE-IPF, Whale Optimization Algorithm, and ensemble learning

Intelligent Systems with Applications Pub Date : 2025-06-14 DOI:10.1016/j.iswa.2025.200543

Tibebu Bekele Shana , Neetu Kumari , Mayank Agarwal , Samrat Mondal , Upaka Rathnayake

{"title":"Anomaly-based intrusion detection system based on SMOTE-IPF, Whale Optimization Algorithm, and ensemble learning","authors":"Tibebu Bekele Shana , Neetu Kumari , Mayank Agarwal , Samrat Mondal , Upaka Rathnayake","doi":"10.1016/j.iswa.2025.200543","DOIUrl":null,"url":null,"abstract":"<div><div>Nowadays, cybersecurity is a major worldwide problem. Intrusion detection systems (IDS) help guarantee network security by detecting malicious entries from legitimate entries in network traffic data. IDS has considerable potential for detecting dynamic cyber threats, identifying abnormalities, and identifying malicious conduct within the network. In this paper, we propose Machine Learning (ML) models with an emphasis on the Synthetic Minority Over-sampling Technique (SMOTE) with Iterative Partitioning Filter (IPF) for class imbalance and the Whale Optimization Algorithm (WOA) for feature selection. Class imbalance often results in poorly constructed ML models prioritizing the majority class. In addition, the absence of feature selection can lead to higher computational complexity without impacting performance accuracy. This study uses Bagging, AdaBoost, Extreme Gradient Boosting (XGBoost) and Extra Trees Classifier as classification models. The two widely used datasets to assess the proposed method are NLS-KDD and UNSW-NB15. The K-Fold cross-validation technique trains this model to minimize potential overfitting. These models are evaluated based on performance metrics such as accuracy, precision, recall, and F1-score. The experimental results demonstrate that the Extra Trees Classifier significantly outperforms the baseline models and achieves accuracy values of 99.9% for the NSL-KDD dataset and 97% for the UNSW-NB 15 dataset and outperforms all evaluation measures compared to baseline models for multi-classification of the IDS.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"27 ","pages":"Article 200543"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Systems with Applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667305325000699","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Nowadays, cybersecurity is a major worldwide problem. Intrusion detection systems (IDS) help guarantee network security by detecting malicious entries from legitimate entries in network traffic data. IDS has considerable potential for detecting dynamic cyber threats, identifying abnormalities, and identifying malicious conduct within the network. In this paper, we propose Machine Learning (ML) models with an emphasis on the Synthetic Minority Over-sampling Technique (SMOTE) with Iterative Partitioning Filter (IPF) for class imbalance and the Whale Optimization Algorithm (WOA) for feature selection. Class imbalance often results in poorly constructed ML models prioritizing the majority class. In addition, the absence of feature selection can lead to higher computational complexity without impacting performance accuracy. This study uses Bagging, AdaBoost, Extreme Gradient Boosting (XGBoost) and Extra Trees Classifier as classification models. The two widely used datasets to assess the proposed method are NLS-KDD and UNSW-NB15. The K-Fold cross-validation technique trains this model to minimize potential overfitting. These models are evaluated based on performance metrics such as accuracy, precision, recall, and F1-score. The experimental results demonstrate that the Extra Trees Classifier significantly outperforms the baseline models and achieves accuracy values of 99.9% for the NSL-KDD dataset and 97% for the UNSW-NB 15 dataset and outperforms all evaluation measures compared to baseline models for multi-classification of the IDS.

查看原文本刊更多论文

基于SMOTE-IPF、Whale优化算法和集成学习的基于异常的入侵检测系统

当今，网络安全是一个重大的世界性问题。入侵检测系统（IDS）通过检测网络流量数据中合法条目中的恶意条目，保障网络安全。IDS在检测动态网络威胁、识别异常和识别网络中的恶意行为方面具有相当大的潜力。在本文中，我们提出了机器学习（ML）模型，重点是基于迭代划分过滤器（IPF）的合成少数过采样技术（SMOTE）和鲸鱼优化算法（WOA）的特征选择。类不平衡通常会导致构造不良的ML模型优先考虑大多数类。此外，缺少特征选择可能导致更高的计算复杂度，而不会影响性能准确性。本研究使用Bagging、AdaBoost、Extreme Gradient Boosting （XGBoost）和Extra Trees Classifier作为分类模型。两个广泛使用的数据集是NLS-KDD和UNSW-NB15。K-Fold交叉验证技术训练该模型以最小化潜在的过拟合。这些模型是基于诸如准确性、精度、召回率和f1分数等性能指标进行评估的。实验结果表明，Extra Trees分类器显著优于基线模型，在NSL-KDD数据集和UNSW-NB 15数据集的准确率分别达到99.9%和97%，并且在IDS多分类方面优于基线模型的所有评估指标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Intelligent Systems with Applications

CiteScore

5.60

自引率

0.00%

发文量