Network Anomaly Detection Using LightGBM: A Gradient Boosting Classifier

2020 30th International Telecommunication Networks and Applications Conference (ITNAC) Pub Date : 2020-11-25 DOI:10.1109/ITNAC50341.2020.9315049

Md. Khairul Islam, Prithula Hridi, Md. Shohrab Hossain, Husnu S. Narman

{"title":"Network Anomaly Detection Using LightGBM: A Gradient Boosting Classifier","authors":"Md. Khairul Islam, Prithula Hridi, Md. Shohrab Hossain, Husnu S. Narman","doi":"10.1109/ITNAC50341.2020.9315049","DOIUrl":null,"url":null,"abstract":"Anomaly detection systems are significant in recognizing intruders or suspicious activities by detecting unseen and unknown attacks. In this paper, we have worked on a benchmark network anomaly detection dataset UNSW-NB15, that reflects modern-day network traffic. Previous works on this dataset either lacked a proper validation approach or followed only one evaluation setup which made it difficult to compare their contributions with others using the same dataset but with different validation steps. In this paper, we have used a machine learning classifier LightGBM to perform binary classification on this dataset. We have presented a thorough study of the dataset with feature engineering, preprocessing, feature selection. We have evaluated the performance of our model using different experimental setups (used in several previous works) to clearly evaluate and compare with others. Using ten-fold cross-validation on the train, test, and combined (training and test) dataset, our model has achieved 97.21%, 98.33%, and 96.21% f1_scores, respectively. Also, the model fitted only on train data, achieved 92.96% f1_score on the separate test data. So our model also provides significant performance on unseen data. We have presented complete comparisons with the prior arts using all performance metrics available on them. And we have also shown that our model outperformed them in most metrics and thus can detect network anomalies better.","PeriodicalId":131639,"journal":{"name":"2020 30th International Telecommunication Networks and Applications Conference (ITNAC)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 30th International Telecommunication Networks and Applications Conference (ITNAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITNAC50341.2020.9315049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Anomaly detection systems are significant in recognizing intruders or suspicious activities by detecting unseen and unknown attacks. In this paper, we have worked on a benchmark network anomaly detection dataset UNSW-NB15, that reflects modern-day network traffic. Previous works on this dataset either lacked a proper validation approach or followed only one evaluation setup which made it difficult to compare their contributions with others using the same dataset but with different validation steps. In this paper, we have used a machine learning classifier LightGBM to perform binary classification on this dataset. We have presented a thorough study of the dataset with feature engineering, preprocessing, feature selection. We have evaluated the performance of our model using different experimental setups (used in several previous works) to clearly evaluate and compare with others. Using ten-fold cross-validation on the train, test, and combined (training and test) dataset, our model has achieved 97.21%, 98.33%, and 96.21% f1_scores, respectively. Also, the model fitted only on train data, achieved 92.96% f1_score on the separate test data. So our model also provides significant performance on unseen data. We have presented complete comparisons with the prior arts using all performance metrics available on them. And we have also shown that our model outperformed them in most metrics and thus can detect network anomalies better.

查看原文本刊更多论文

基于LightGBM的网络异常检测:梯度增强分类器

异常检测系统通过检测不可见和未知的攻击，在识别入侵者或可疑活动方面具有重要意义。在本文中，我们研究了一个反映现代网络流量的基准网络异常检测数据集UNSW-NB15。先前对该数据集的研究要么缺乏适当的验证方法，要么只遵循一种评估设置，这使得很难将他们的贡献与使用相同数据集但验证步骤不同的其他人进行比较。在本文中，我们使用机器学习分类器LightGBM对该数据集进行二值分类。我们从特征工程、预处理、特征选择等方面对数据集进行了深入的研究。我们使用不同的实验设置(在以前的几项工作中使用)来评估我们的模型的性能，以便清楚地评估和比较其他模型。在训练、测试和组合(训练和测试)数据集上使用十倍交叉验证，我们的模型分别达到了97.21%、98.33%和96.21%的f1_scores。同时，该模型仅对列车数据进行拟合，在单独的测试数据上达到了92.96%的f1_score。因此，我们的模型在未见过的数据上也提供了显著的性能。我们使用所有可用的性能指标，与现有技术进行了完整的比较。我们还表明，我们的模型在大多数指标上都优于它们，因此可以更好地检测网络异常。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 30th International Telecommunication Networks and Applications Conference (ITNAC)

自引率

0.00%

发文量