Internet Traffic Detection using Naïve Bayes and K-Nearest Neighbors (KNN) algorithm

2019 International Conference on Intelligent Computing and Control Systems (ICCS) Pub Date : 2019-05-01 DOI:10.1109/ICCS45141.2019.9065655

M. Dixit, R. Sharma, Saniya Shaikh, Krutika Muley

{"title":"Internet Traffic Detection using Naïve Bayes and K-Nearest Neighbors (KNN) algorithm","authors":"M. Dixit, R. Sharma, Saniya Shaikh, Krutika Muley","doi":"10.1109/ICCS45141.2019.9065655","DOIUrl":null,"url":null,"abstract":"Growth of internet has led to rise in number of users and its usage. Despite its advantages, exponential rise in internet usage has resulted in excess data flow over the system flooding the internet. To maintain quality of service and speed of internet along with ensuring data security as well as preventing data misuse, analysis of the internet data becomes essential. Analysis of the dataflow involves characterizing it into different types. This can be done by inspecting the packets either on basis of port numbers, payload information or statistical features. This paper aims to discuss the analysis of internet traffic using statistical features such as interpacket arrival time, time to live and number of packets helping us prevent invasion of packet information. This helps us protect user’s privacy. To automate the process of categorizing internet traffic, machine learning based supervised classification techniques namely Naive Bayes and K Nearest Neighbors are implemented. Experiments to obtain highest accuracy in classifying internet traffic on basis of transaction protocol were performed. The dataset used is UNSW-NB. The results show that classification using K-Nearest Neighbors algorithm gives accuracy of 85% whereas maximum accuracy achieved using Naïve Bayes algorithm is 54%.","PeriodicalId":433980,"journal":{"name":"2019 International Conference on Intelligent Computing and Control Systems (ICCS)","volume":"48 12","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Intelligent Computing and Control Systems (ICCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCS45141.2019.9065655","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Growth of internet has led to rise in number of users and its usage. Despite its advantages, exponential rise in internet usage has resulted in excess data flow over the system flooding the internet. To maintain quality of service and speed of internet along with ensuring data security as well as preventing data misuse, analysis of the internet data becomes essential. Analysis of the dataflow involves characterizing it into different types. This can be done by inspecting the packets either on basis of port numbers, payload information or statistical features. This paper aims to discuss the analysis of internet traffic using statistical features such as interpacket arrival time, time to live and number of packets helping us prevent invasion of packet information. This helps us protect user’s privacy. To automate the process of categorizing internet traffic, machine learning based supervised classification techniques namely Naive Bayes and K Nearest Neighbors are implemented. Experiments to obtain highest accuracy in classifying internet traffic on basis of transaction protocol were performed. The dataset used is UNSW-NB. The results show that classification using K-Nearest Neighbors algorithm gives accuracy of 85% whereas maximum accuracy achieved using Naïve Bayes algorithm is 54%.

查看原文本刊更多论文

基于Naïve贝叶斯和k近邻(KNN)算法的互联网流量检测

互联网的发展导致了用户数量和使用量的增加。尽管它有很多优点，但互联网使用率的指数级增长导致了系统中过多的数据流淹没了互联网。为了维持互联网的服务质量和速度、确保数据安全以及防止数据滥用，对互联网数据的分析变得至关重要。对数据流的分析包括将其划分为不同的类型。这可以通过根据端口号、有效负载信息或统计特征检查数据包来完成。本文的目的是讨论利用数据包间到达时间、生存时间和数据包数量等统计特征来分析互联网流量，以帮助我们防止数据包信息的入侵。这有助于我们保护用户的隐私。为了自动化对互联网流量进行分类的过程，实现了基于机器学习的监督分类技术，即朴素贝叶斯和K近邻。为了使基于交易协议的网络流量分类达到最高的准确率，进行了实验。使用的数据集为UNSW-NB。结果表明，使用K-Nearest Neighbors算法的分类准确率为85%，而使用Naïve Bayes算法的分类准确率最高为54%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 International Conference on Intelligent Computing and Control Systems (ICCS)

自引率

0.00%

发文量