基于Apache Spark的网络流量异常检测

2019 International Conference on Advanced Information Technologies (ICAIT) Pub Date : 2019-11-01 DOI:10.1109/AITC.2019.8920897

P. H. Pwint, T. Shwe

{"title":"基于Apache Spark的网络流量异常检测","authors":"P. H. Pwint, T. Shwe","doi":"10.1109/AITC.2019.8920897","DOIUrl":null,"url":null,"abstract":"With the growing amount of internet and IoT traffic across the network, network anomaly detection system has become a popular and useful strategy to detect anomalies, attacks and intrusions. With machine learning technique, network traffic anomalies can be detected with reasonable prediction accuracy. However, most of the previous work has been focused on detecting anomalies using traditional machine learning environment. Because of ever increasing amount of data and high speed networks, traditional machine learning environment becomes infeasible to cope with the current condition. In this paper, we investigate the feasibility of the applying one of the big data technologies, Apache Spark, to classify different attacks rather than detecting anomalies. We employ traditional machine learning algorithms, namely, Multinomial Logistic Regression, Decision Tree, Random Forest, Multi-layer perceptron and Naïve Bayes using generated dataset of MAWILab gold standard and classify 15 different attack types. In addition, we investigate the efficiency of Apache Spark in terms of accuracy and speed under varied configuration setting of Spark. Our results demonstrate that employing big data technologies adds more benefits to network traffic anomaly detector than traditional machine learning environment in terms of prediction accuracy and execution time.","PeriodicalId":388642,"journal":{"name":"2019 International Conference on Advanced Information Technologies (ICAIT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Network Traffic Anomaly Detection based on Apache Spark\",\"authors\":\"P. H. Pwint, T. Shwe\",\"doi\":\"10.1109/AITC.2019.8920897\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the growing amount of internet and IoT traffic across the network, network anomaly detection system has become a popular and useful strategy to detect anomalies, attacks and intrusions. With machine learning technique, network traffic anomalies can be detected with reasonable prediction accuracy. However, most of the previous work has been focused on detecting anomalies using traditional machine learning environment. Because of ever increasing amount of data and high speed networks, traditional machine learning environment becomes infeasible to cope with the current condition. In this paper, we investigate the feasibility of the applying one of the big data technologies, Apache Spark, to classify different attacks rather than detecting anomalies. We employ traditional machine learning algorithms, namely, Multinomial Logistic Regression, Decision Tree, Random Forest, Multi-layer perceptron and Naïve Bayes using generated dataset of MAWILab gold standard and classify 15 different attack types. In addition, we investigate the efficiency of Apache Spark in terms of accuracy and speed under varied configuration setting of Spark. Our results demonstrate that employing big data technologies adds more benefits to network traffic anomaly detector than traditional machine learning environment in terms of prediction accuracy and execution time.\",\"PeriodicalId\":388642,\"journal\":{\"name\":\"2019 International Conference on Advanced Information Technologies (ICAIT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Advanced Information Technologies (ICAIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AITC.2019.8920897\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Advanced Information Technologies (ICAIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AITC.2019.8920897","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

随着互联网和物联网流量的不断增长，网络异常检测系统已经成为检测异常、攻击和入侵的一种流行和有用的策略。利用机器学习技术，可以检测网络流量异常，并具有合理的预测精度。然而，之前的大部分工作都集中在使用传统的机器学习环境检测异常上。由于数据量的不断增加和网络的高速发展，传统的机器学习环境已经无法适应当前的情况。在本文中，我们研究了应用大数据技术之一Apache Spark对不同的攻击进行分类而不是检测异常的可行性。我们使用传统的机器学习算法，即多项式逻辑回归、决策树、随机森林、多层感知器和Naïve贝叶斯，使用MAWILab金标准生成的数据集，对15种不同的攻击类型进行分类。此外，我们还研究了Apache Spark在不同配置设置下的准确性和速度效率。我们的研究结果表明，与传统的机器学习环境相比，采用大数据技术在预测精度和执行时间方面为网络流量异常检测器带来了更多的好处。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Network Traffic Anomaly Detection based on Apache Spark

With the growing amount of internet and IoT traffic across the network, network anomaly detection system has become a popular and useful strategy to detect anomalies, attacks and intrusions. With machine learning technique, network traffic anomalies can be detected with reasonable prediction accuracy. However, most of the previous work has been focused on detecting anomalies using traditional machine learning environment. Because of ever increasing amount of data and high speed networks, traditional machine learning environment becomes infeasible to cope with the current condition. In this paper, we investigate the feasibility of the applying one of the big data technologies, Apache Spark, to classify different attacks rather than detecting anomalies. We employ traditional machine learning algorithms, namely, Multinomial Logistic Regression, Decision Tree, Random Forest, Multi-layer perceptron and Naïve Bayes using generated dataset of MAWILab gold standard and classify 15 different attack types. In addition, we investigate the efficiency of Apache Spark in terms of accuracy and speed under varied configuration setting of Spark. Our results demonstrate that employing big data technologies adds more benefits to network traffic anomaly detector than traditional machine learning environment in terms of prediction accuracy and execution time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 International Conference on Advanced Information Technologies (ICAIT)

自引率

0.00%

发文量