{"title":"Attenuating majority attack class bias using hybrid deep learning based IDS framework","authors":"K.G. Raghavendra Narayan , Rakesh Ganesula , Tamminaina Sai Somasekhar , Srijanee Mookherji , Vanga Odelu , Rajendra Prasath , Alavalapati Goutham Reddy","doi":"10.1016/j.jnca.2024.103954","DOIUrl":null,"url":null,"abstract":"<div><p>In real-time application domains, like finance, healthcare and defence, delay in service or stealing information may lead to unrecoverable consequences. So, early detection of intrusion is important to prevent security breaches. In recent days, anomaly-based intrusion detection using Hybrid Deep Learning approaches are becoming more popular. The most used benchmark datasets in the literature are NSL-KDD and UNSW-NB15, and these datasets are imbalanced. The models built on imbalanced datasets may lead to biased results towards majority classes by neglecting the minority class, even though they are equally important. In many cases, high accuracy is achieved for majority classes in the imbalanced datasets. But, the class-level performances are poor with respect to the minority class. The class balancing will also play an important role in attenuating the bias in prediction for imbalanced datasets. In this paper, a Hybrid Deep Learning Based Intrusion Detection (HDLBID) framework is proposed with CNN-BiLSTM combination. The four techniques, namely, Random Oversampling (ROS), ADASYN, SMOTE, and SMOTE-Tomek, are used for class balancing in the proposed HDLBID framework. The proposed HDLBID with SMOTE-Tomek achieves an overall accuracy of 99.6% with NSL-KDD and 89.02% for UNSW-NB15. It results in an improvement of 13.67% for NSL-KDD and 10.62% for UNSW-NB15 over the existing recent related models. In the proposed HDLBID, in addition to overall accuracy, the class-level <span><math><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> score is also calculated. A comparative study is presented to show the effectiveness of balancing dataset compared to imbalanced dataset, and observed that the SMOTE-Tomek class balancing comparatively performed well. An improvement of 37.43% is observed in the U2R class of the NSL-KDD dataset and 61.65% improvement is seen in the Worms class of the UNSW-NB15 dataset, both with SMOTE-Tomek class balancing. Therefore, the proposed HDLBID with SMOTE-Tomek class balancing reports the best results in terms of overall accuracy compared to existing recent related approaches. Also, in terms of class-level analysis, HDLBID reports best results with SMOTE-Tomek over imbalanced version of datasets.</p></div>","PeriodicalId":54784,"journal":{"name":"Journal of Network and Computer Applications","volume":"230 ","pages":"Article 103954"},"PeriodicalIF":7.7000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Network and Computer Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1084804524001310","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
In real-time application domains, like finance, healthcare and defence, delay in service or stealing information may lead to unrecoverable consequences. So, early detection of intrusion is important to prevent security breaches. In recent days, anomaly-based intrusion detection using Hybrid Deep Learning approaches are becoming more popular. The most used benchmark datasets in the literature are NSL-KDD and UNSW-NB15, and these datasets are imbalanced. The models built on imbalanced datasets may lead to biased results towards majority classes by neglecting the minority class, even though they are equally important. In many cases, high accuracy is achieved for majority classes in the imbalanced datasets. But, the class-level performances are poor with respect to the minority class. The class balancing will also play an important role in attenuating the bias in prediction for imbalanced datasets. In this paper, a Hybrid Deep Learning Based Intrusion Detection (HDLBID) framework is proposed with CNN-BiLSTM combination. The four techniques, namely, Random Oversampling (ROS), ADASYN, SMOTE, and SMOTE-Tomek, are used for class balancing in the proposed HDLBID framework. The proposed HDLBID with SMOTE-Tomek achieves an overall accuracy of 99.6% with NSL-KDD and 89.02% for UNSW-NB15. It results in an improvement of 13.67% for NSL-KDD and 10.62% for UNSW-NB15 over the existing recent related models. In the proposed HDLBID, in addition to overall accuracy, the class-level score is also calculated. A comparative study is presented to show the effectiveness of balancing dataset compared to imbalanced dataset, and observed that the SMOTE-Tomek class balancing comparatively performed well. An improvement of 37.43% is observed in the U2R class of the NSL-KDD dataset and 61.65% improvement is seen in the Worms class of the UNSW-NB15 dataset, both with SMOTE-Tomek class balancing. Therefore, the proposed HDLBID with SMOTE-Tomek class balancing reports the best results in terms of overall accuracy compared to existing recent related approaches. Also, in terms of class-level analysis, HDLBID reports best results with SMOTE-Tomek over imbalanced version of datasets.
期刊介绍:
The Journal of Network and Computer Applications welcomes research contributions, surveys, and notes in all areas relating to computer networks and applications thereof. Sample topics include new design techniques, interesting or novel applications, components or standards; computer networks with tools such as WWW; emerging standards for internet protocols; Wireless networks; Mobile Computing; emerging computing models such as cloud computing, grid computing; applications of networked systems for remote collaboration and telemedicine, etc. The journal is abstracted and indexed in Scopus, Engineering Index, Web of Science, Science Citation Index Expanded and INSPEC.