Oluwadamilare Harazeem Abdulganiyu , Taha Ait Tchakoucht , Ahmed El Hilali Alaoui , Yakub Kayode Saheed
{"title":"Attention-driven multi-model architecture for unbalanced network traffic intrusion detection via extreme gradient boosting","authors":"Oluwadamilare Harazeem Abdulganiyu , Taha Ait Tchakoucht , Ahmed El Hilali Alaoui , Yakub Kayode Saheed","doi":"10.1016/j.iswa.2025.200519","DOIUrl":null,"url":null,"abstract":"<div><div>Network Intrusion Detection Systems (NIDS) face significant challenges in identifying rare attack instances due to the inherent class imbalance and diversity in network traffic. This imbalance, often characterized by a dominance of benign network traffic data, reduces the effectiveness of traditional detection methods. To address this, we proposed CWFLAM-VAE, an attention-driven multi-model architecture that combines Class-Wise Focal Loss, Variational Autoencoder, and Extreme Gradient Boosting. CWFLAM-VAE generates synthetic rare-class attack data while preserving the original feature distribution, mitigating imbalance and improving classification performance. The effectiveness of our proposed system was evaluated by employing two datasets, one of which is the NSL-KDD, which exhibits a skewed distribution of network traffic favoring the majority class, and CSE-CIC-IDS2018 dataset, where approximately 83 % of the data consists of benign network traffic. We compared our method with existing sampling techniques (SMOTE, ROS, ADASYN, RUS) and existing classifiers (Logistic Regression, KNN, SVM, Decision Tree, LSTM, CNN). The experimental findings distinctly reveal the efficacy of the CWFLAM-VAE in resolving class imbalance concerns, with Extreme Gradient Boosting surpassing alternative machine learning techniques particularly in the detection of rare instances of attack traffic with an f-score of 97.6 % and 98.1 %, as well as a false positive rate of 0.17 and 0.27 for both data respectively.</div></div>","PeriodicalId":100684,"journal":{"name":"Intelligent Systems with Applications","volume":"26 ","pages":"Article 200519"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Systems with Applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667305325000456","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Network Intrusion Detection Systems (NIDS) face significant challenges in identifying rare attack instances due to the inherent class imbalance and diversity in network traffic. This imbalance, often characterized by a dominance of benign network traffic data, reduces the effectiveness of traditional detection methods. To address this, we proposed CWFLAM-VAE, an attention-driven multi-model architecture that combines Class-Wise Focal Loss, Variational Autoencoder, and Extreme Gradient Boosting. CWFLAM-VAE generates synthetic rare-class attack data while preserving the original feature distribution, mitigating imbalance and improving classification performance. The effectiveness of our proposed system was evaluated by employing two datasets, one of which is the NSL-KDD, which exhibits a skewed distribution of network traffic favoring the majority class, and CSE-CIC-IDS2018 dataset, where approximately 83 % of the data consists of benign network traffic. We compared our method with existing sampling techniques (SMOTE, ROS, ADASYN, RUS) and existing classifiers (Logistic Regression, KNN, SVM, Decision Tree, LSTM, CNN). The experimental findings distinctly reveal the efficacy of the CWFLAM-VAE in resolving class imbalance concerns, with Extreme Gradient Boosting surpassing alternative machine learning techniques particularly in the detection of rare instances of attack traffic with an f-score of 97.6 % and 98.1 %, as well as a false positive rate of 0.17 and 0.27 for both data respectively.