基于聚类算法的流量分类

Annual ACM Workshop on Mining Network Data Pub Date : 2006-09-11 DOI:10.1145/1162678.1162679

Jeffrey Erman, M. Arlitt, A. Mahanti

{"title":"基于聚类算法的流量分类","authors":"Jeffrey Erman, M. Arlitt, A. Mahanti","doi":"10.1145/1162678.1162679","DOIUrl":null,"url":null,"abstract":"Classification of network traffic using port-based or payload-based analysis is becoming increasingly difficult with many peer-to-peer (P2P) applications using dynamic port numbers, masquerading techniques, and encryption to avoid detection. An alternative approach is to classify traffic by exploiting the distinctive characteristics of applications when they communicate on a network. We pursue this latter approach and demonstrate how cluster analysis can be used to effectively identify groups of traffic that are similar using only transport layer statistics. Our work considers two unsupervised clustering algorithms, namely K-Means and DBSCAN, that have previously not been used for network traffic classification. We evaluate these two algorithms and compare them to the previously used AutoClass algorithm, using empirical Internet traces. The experimental results show that both K-Means and DBSCAN work very well and much more quickly then AutoClass. Our results indicate that although DBSCAN has lower accuracy compared to K-Means and AutoClass, DBSCAN produces better clusters.","PeriodicalId":216113,"journal":{"name":"Annual ACM Workshop on Mining Network Data","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"766","resultStr":"{\"title\":\"Traffic classification using clustering algorithms\",\"authors\":\"Jeffrey Erman, M. Arlitt, A. Mahanti\",\"doi\":\"10.1145/1162678.1162679\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Classification of network traffic using port-based or payload-based analysis is becoming increasingly difficult with many peer-to-peer (P2P) applications using dynamic port numbers, masquerading techniques, and encryption to avoid detection. An alternative approach is to classify traffic by exploiting the distinctive characteristics of applications when they communicate on a network. We pursue this latter approach and demonstrate how cluster analysis can be used to effectively identify groups of traffic that are similar using only transport layer statistics. Our work considers two unsupervised clustering algorithms, namely K-Means and DBSCAN, that have previously not been used for network traffic classification. We evaluate these two algorithms and compare them to the previously used AutoClass algorithm, using empirical Internet traces. The experimental results show that both K-Means and DBSCAN work very well and much more quickly then AutoClass. Our results indicate that although DBSCAN has lower accuracy compared to K-Means and AutoClass, DBSCAN produces better clusters.\",\"PeriodicalId\":216113,\"journal\":{\"name\":\"Annual ACM Workshop on Mining Network Data\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"766\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annual ACM Workshop on Mining Network Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1162678.1162679\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual ACM Workshop on Mining Network Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1162678.1162679","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 766

摘要

使用基于端口或基于有效负载的分析对网络流量进行分类变得越来越困难，因为许多点对点(P2P)应用程序使用动态端口号、伪装技术和加密来避免检测。另一种方法是利用应用程序在网络上通信时的不同特征对流量进行分类。我们将采用后一种方法，并演示如何使用聚类分析来有效地识别仅使用传输层统计信息相似的流量组。我们的工作考虑了两种无监督聚类算法，即K-Means和DBSCAN，这两种算法以前没有用于网络流量分类。我们评估了这两种算法，并将它们与以前使用的AutoClass算法进行比较，使用经验的互联网痕迹。实验结果表明，K-Means算法和DBSCAN算法都比AutoClass算法运行速度快得多。我们的结果表明，尽管与K-Means和AutoClass相比，DBSCAN的准确率较低，但DBSCAN产生了更好的聚类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Traffic classification using clustering algorithms

Classification of network traffic using port-based or payload-based analysis is becoming increasingly difficult with many peer-to-peer (P2P) applications using dynamic port numbers, masquerading techniques, and encryption to avoid detection. An alternative approach is to classify traffic by exploiting the distinctive characteristics of applications when they communicate on a network. We pursue this latter approach and demonstrate how cluster analysis can be used to effectively identify groups of traffic that are similar using only transport layer statistics. Our work considers two unsupervised clustering algorithms, namely K-Means and DBSCAN, that have previously not been used for network traffic classification. We evaluate these two algorithms and compare them to the previously used AutoClass algorithm, using empirical Internet traces. The experimental results show that both K-Means and DBSCAN work very well and much more quickly then AutoClass. Our results indicate that although DBSCAN has lower accuracy compared to K-Means and AutoClass, DBSCAN produces better clusters.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Annual ACM Workshop on Mining Network Data

自引率

0.00%

发文量