数据挖掘实现了使用离群值检测方法检测网络流量数据包中的异常情况

JIKO (Jurnal Informatika dan Komputer) Pub Date : 2023-08-06 DOI:10.33387/jiko.v6i2.6092

Kurnia Setiawan, Arief Wibowo

{"title":"数据挖掘实现了使用离群值检测方法检测网络流量数据包中的异常情况","authors":"Kurnia Setiawan, Arief Wibowo","doi":"10.33387/jiko.v6i2.6092","DOIUrl":null,"url":null,"abstract":"The large number of data packet records of network traffic can be used to evaluate the quality of a network as well as to analyze the occurrence of anomalies in the network, both related to network security and network performance. Based on the data obtained, the occurrence of anomalies in computer networks can not be detected specifically on which traffic packets. Meanwhile, to monitor network traffic packets manually will require a lot of time and resources, making it difficult to detect potential anomaly events more specifically. This study analyzes network packet traffic data to see records that include anomalies with an outlier detection approach, using the Isolation Forest algorithm to detect outliers on network traffic packet data, with the result that minority data are of the outliers type of 1,643 records (4.86%), while inliers are 32,098 records (95.13%). Then check and filter the expert attributes that contain expert information. The outlier detection results were classified using 5 algorithms as comparison, namely Random Forest Classifier, Support Vector Machine, Decision Tree Classifier, K-Nearest Neighbor, and Bernoulli Naive Bayes. The Random Forest algorithm has the highest score for accuracy, macro average precision, and macro average f1-score, namely 0.9962067330488383; 0.78; and 0.82. The classification model can be used to classify samples with labels \"inliers\", \"outliers\", \"Error\", and \"warning outliers\". There are labels that have scores for precision, recall, and f1-scrore that are not too high, namely the labels â€œerrorâ€ (0.50; 1.00; and 0.67) and â€œwarning outlierâ€ (0.64; 0 .70; 0.67). The resulting classification model is used for prototype development that facilitates the process of investigating potential network traffic packet anomalies more specifically.","PeriodicalId":243297,"journal":{"name":"JIKO (Jurnal Informatika dan Komputer)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DATA MINING IMPLEMENTATION FOR DETECTION OF ANOMALIES IN NETWORK TRAFFIC PACKETS USING OUTLIER DETECTION APPROACH\",\"authors\":\"Kurnia Setiawan, Arief Wibowo\",\"doi\":\"10.33387/jiko.v6i2.6092\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The large number of data packet records of network traffic can be used to evaluate the quality of a network as well as to analyze the occurrence of anomalies in the network, both related to network security and network performance. Based on the data obtained, the occurrence of anomalies in computer networks can not be detected specifically on which traffic packets. Meanwhile, to monitor network traffic packets manually will require a lot of time and resources, making it difficult to detect potential anomaly events more specifically. This study analyzes network packet traffic data to see records that include anomalies with an outlier detection approach, using the Isolation Forest algorithm to detect outliers on network traffic packet data, with the result that minority data are of the outliers type of 1,643 records (4.86%), while inliers are 32,098 records (95.13%). Then check and filter the expert attributes that contain expert information. The outlier detection results were classified using 5 algorithms as comparison, namely Random Forest Classifier, Support Vector Machine, Decision Tree Classifier, K-Nearest Neighbor, and Bernoulli Naive Bayes. The Random Forest algorithm has the highest score for accuracy, macro average precision, and macro average f1-score, namely 0.9962067330488383; 0.78; and 0.82. The classification model can be used to classify samples with labels \\\"inliers\\\", \\\"outliers\\\", \\\"Error\\\", and \\\"warning outliers\\\". There are labels that have scores for precision, recall, and f1-scrore that are not too high, namely the labels â€œerrorâ€ (0.50; 1.00; and 0.67) and â€œwarning outlierâ€ (0.64; 0 .70; 0.67). The resulting classification model is used for prototype development that facilitates the process of investigating potential network traffic packet anomalies more specifically.\",\"PeriodicalId\":243297,\"journal\":{\"name\":\"JIKO (Jurnal Informatika dan Komputer)\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JIKO (Jurnal Informatika dan Komputer)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.33387/jiko.v6i2.6092\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JIKO (Jurnal Informatika dan Komputer)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33387/jiko.v6i2.6092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

大量的网络流量数据包记录可以用来评价网络的质量，也可以用来分析网络中出现的异常情况，涉及到网络安全和网络性能。根据所获得的数据，无法检测出计算机网络中发生异常的具体是哪些流量数据包。同时，手工监控网络流量数据包需要耗费大量的时间和资源，难以更准确地检测到潜在的异常事件。本研究采用异常点检测方法对网络数据包流量数据进行分析，查看包含异常的记录，使用隔离森林算法对网络流量数据包数据进行异常点检测，结果显示少数数据为1,643条异常点类型(4.86%)，而内线为32,098条记录(95.13%)。然后对包含专家信息的专家属性进行检查和过滤。使用随机森林分类器、支持向量机、决策树分类器、k近邻分类器和伯努利朴素贝叶斯5种算法对离群点检测结果进行分类比较。随机森林算法的准确率、宏观平均精度和宏观平均f1-score得分最高，为0.9962067330488383;0.78;和0.82。该分类模型可以对带有“inliers”、“outliers”、“Error”和“warning outliers”标签的样本进行分类。有些标签的准确率、召回率和f1分都不太高，即标签 - œerrorâ -”(0.50;1.00;0.67)和€œwarning outlier€€(0.64;0 2;0.67)。由此产生的分类模型用于原型开发，以促进更具体地调查潜在的网络流量数据包异常的过程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DATA MINING IMPLEMENTATION FOR DETECTION OF ANOMALIES IN NETWORK TRAFFIC PACKETS USING OUTLIER DETECTION APPROACH

The large number of data packet records of network traffic can be used to evaluate the quality of a network as well as to analyze the occurrence of anomalies in the network, both related to network security and network performance. Based on the data obtained, the occurrence of anomalies in computer networks can not be detected specifically on which traffic packets. Meanwhile, to monitor network traffic packets manually will require a lot of time and resources, making it difficult to detect potential anomaly events more specifically. This study analyzes network packet traffic data to see records that include anomalies with an outlier detection approach, using the Isolation Forest algorithm to detect outliers on network traffic packet data, with the result that minority data are of the outliers type of 1,643 records (4.86%), while inliers are 32,098 records (95.13%). Then check and filter the expert attributes that contain expert information. The outlier detection results were classified using 5 algorithms as comparison, namely Random Forest Classifier, Support Vector Machine, Decision Tree Classifier, K-Nearest Neighbor, and Bernoulli Naive Bayes. The Random Forest algorithm has the highest score for accuracy, macro average precision, and macro average f1-score, namely 0.9962067330488383; 0.78; and 0.82. The classification model can be used to classify samples with labels "inliers", "outliers", "Error", and "warning outliers". There are labels that have scores for precision, recall, and f1-scrore that are not too high, namely the labels â€œerrorâ€ (0.50; 1.00; and 0.67) and â€œwarning outlierâ€ (0.64; 0 .70; 0.67). The resulting classification model is used for prototype development that facilitates the process of investigating potential network traffic packet anomalies more specifically.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

JIKO (Jurnal Informatika dan Komputer)

自引率

0.00%

发文量