A Hybrid Extreme Gradient Boosting and Long Short-Term Memory Algorithm for Cyber Threats Detection

Mendel Pub Date : 2023-12-20 DOI:10.13164/mendel.2023.2.307

Reham Amin, Ghada El-Taweel, Ahmed Fouad Ali, Mohamed Tahoun

{"title":"A Hybrid Extreme Gradient Boosting and Long Short-Term Memory Algorithm for Cyber Threats Detection","authors":"Reham Amin, Ghada El-Taweel, Ahmed Fouad Ali, Mohamed Tahoun","doi":"10.13164/mendel.2023.2.307","DOIUrl":null,"url":null,"abstract":"The vast amounts of data, lack of scalability, and low detection rates of traditional intrusion detection technologies make it impossible to keep up with evolving and increasingly sophisticated cyber threats. Therefore, there is an urgent need to detect and stop cyber threats early. Deep Learning has greatly improved intrusion detection due to its ability to self-learn and extract highly accurate features. In this paper, a Hybrid XG Boosted and Long Short-Term Memory algorithm (HXGBLSTM) is proposed. A comparative analysis is conducted between the computational performance of six established evolutionary computation algorithms and the recently developed bio-inspired metaheuristic algorithm called Zebra Optimisation Algorithm. These algorithms include the Particle Swarm Optimisation Algorithm, the Bio-inspired Algorithms, Bat Optimisation Algorithm, Firefly Optimisation Algorithm, and Monarch Butterfly Optimisation Algorithm, as well as the Genetic Algorithm as an Evolutionary Algorithm. The dimensionality curse has been mitigated by using these metaheuristic methods for feature selection, and the results are compared with the wrapper-based feature selection XGBoost algorithm. The proposed algorithm uses the CSE-CIC -IDS2018 dataset, which contains the latest network attacks. XGBoost outperformed the other FS algorithms and was used as the feature selection algorithm. In evaluating the effectiveness of the newly proposed HXGBLSTM, binary and multi-class classifications are considered. When comparing the performance of the proposed HXGBLSTM for cyber threat detection, it outperforms seven innovative deep learning algorithms for binary classification and four of them for multi-class classification. Other evaluation criteria such as recall, F1 score, and precision have been also used for comparison. The results showed that the best accuracy for binary classification is 99.8\\%, with F1-score of 99.83\\%, precision of 99.85\\%, and recall of 99.82\\%, in extensive and detailed experiments conducted on a real dataset. The best accuracy, F1-score, precision, and recall for multi-class classification were all around 100\\%, which does give the proposed algorithm an advantage over the compared ones.","PeriodicalId":38293,"journal":{"name":"Mendel","volume":"26 25","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mendel","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.13164/mendel.2023.2.307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The vast amounts of data, lack of scalability, and low detection rates of traditional intrusion detection technologies make it impossible to keep up with evolving and increasingly sophisticated cyber threats. Therefore, there is an urgent need to detect and stop cyber threats early. Deep Learning has greatly improved intrusion detection due to its ability to self-learn and extract highly accurate features. In this paper, a Hybrid XG Boosted and Long Short-Term Memory algorithm (HXGBLSTM) is proposed. A comparative analysis is conducted between the computational performance of six established evolutionary computation algorithms and the recently developed bio-inspired metaheuristic algorithm called Zebra Optimisation Algorithm. These algorithms include the Particle Swarm Optimisation Algorithm, the Bio-inspired Algorithms, Bat Optimisation Algorithm, Firefly Optimisation Algorithm, and Monarch Butterfly Optimisation Algorithm, as well as the Genetic Algorithm as an Evolutionary Algorithm. The dimensionality curse has been mitigated by using these metaheuristic methods for feature selection, and the results are compared with the wrapper-based feature selection XGBoost algorithm. The proposed algorithm uses the CSE-CIC -IDS2018 dataset, which contains the latest network attacks. XGBoost outperformed the other FS algorithms and was used as the feature selection algorithm. In evaluating the effectiveness of the newly proposed HXGBLSTM, binary and multi-class classifications are considered. When comparing the performance of the proposed HXGBLSTM for cyber threat detection, it outperforms seven innovative deep learning algorithms for binary classification and four of them for multi-class classification. Other evaluation criteria such as recall, F1 score, and precision have been also used for comparison. The results showed that the best accuracy for binary classification is 99.8\%, with F1-score of 99.83\%, precision of 99.85\%, and recall of 99.82\%, in extensive and detailed experiments conducted on a real dataset. The best accuracy, F1-score, precision, and recall for multi-class classification were all around 100\%, which does give the proposed algorithm an advantage over the compared ones.

查看原文本刊更多论文

用于网络威胁检测的极梯度提升和长短期记忆混合算法

传统入侵检测技术数据量大、缺乏可扩展性、检测率低，无法跟上不断发展和日益复杂的网络威胁。因此，我们迫切需要及早发现和阻止网络威胁。深度学习能够自我学习并提取高精度的特征，因此大大提高了入侵检测的效率。本文提出了一种混合 XG 提升和长短期记忆算法（HXGBLSTM）。本文对六种成熟的进化计算算法和最近开发的生物启发元启发式算法--斑马优化算法--的计算性能进行了比较分析。这些算法包括粒子群优化算法、生物启发算法、蝙蝠优化算法、萤火虫优化算法和帝王蝶优化算法，以及作为进化算法的遗传算法。通过使用这些元启发式方法进行特征选择，维度诅咒得到了缓解，并将结果与基于包装的特征选择 XGBoost 算法进行了比较。提出的算法使用了 CSE-CIC -IDS2018 数据集，其中包含最新的网络攻击。XGBoost 的性能优于其他 FS 算法，并被用作特征选择算法。在评估新提出的 HXGBLSTM 的有效性时，考虑了二元分类和多类分类。在比较所提出的 HXGBLSTM 在网络威胁检测方面的性能时，它在二元分类方面优于七种创新深度学习算法，在多类分类方面优于其中四种。其他评价标准，如召回率、F1 分数和精确度也被用于比较。结果表明，在真实数据集上进行的大量详细实验表明，二元分类的最佳准确率为99.8\%，F1分数为99.83\%，精确度为99.85\%，召回率为99.82\%。多类分类的最佳准确率、F1-分数、精确度和召回率都在 100%左右，这确实让所提出的算法比其他算法更有优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Mendel Decision Sciences-Decision Sciences (miscellaneous)

CiteScore

2.20

自引率

0.00%

发文量