Clustering and Feature Selection Technique for Improving Internet Traffic Classification Using K-NN

Journal of Advances in Computer Networks Pub Date : 2016-03-01 DOI:10.18178/JACN.2016.4.1.198

Trianggoro Wiradinata, A. Paramita

{"title":"Clustering and Feature Selection Technique for Improving Internet Traffic Classification Using K-NN","authors":"Trianggoro Wiradinata, A. Paramita","doi":"10.18178/JACN.2016.4.1.198","DOIUrl":null,"url":null,"abstract":"This research will use the algorithm K-Nearest Neighbour (K-NN) to classify internet data traffic, K-NN is suitable for large amounts of data and can produce a more accurate classification, K-NN algorithm has a weakness takes computing high because K-NN algorithm calculating the distance of all existing data. One solution to overcome these weaknesses is to do the clustering process before the classification process, because the clustering process does not require high computing time, clustering algorithm that can be used is Fuzzy C-Mean algorithm, the Fuzzy C-Mean algorithm does not need to be determined in first number of clusters to be formed, clusters that form on this algorithm will be formed naturally based datasets be entered, but the algorithm Fuzzy C-Mean has the disadvantage of clustering results obtained are often not the same even though the same input data, this is because the initial dataset that of the Fuzzy C-Mean is not optimal, to optimize initial datasets in this research using feature selection algorithm, after main feature of dataset selected the output from fuzzy C-Mean become consistent. Selection of the features is a method that is expected to provide an initial dataset that is optimum for the algorithm Fuzzy C-Means. Algorithms for feature selection in this study used are Principal Component Analysis (PCA). PCA reduced non significant attribute to created optimal dataset and can improve performance clustering and classification algorithm. Results in this study is an combining method of classification, clustering and feature extraction of data, these three methods successfully modeled to generate a data classification method of internet bandwidth usage that has high accuracy and have a fast performance.","PeriodicalId":232851,"journal":{"name":"Journal of Advances in Computer Networks","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Advances in Computer Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18178/JACN.2016.4.1.198","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

This research will use the algorithm K-Nearest Neighbour (K-NN) to classify internet data traffic, K-NN is suitable for large amounts of data and can produce a more accurate classification, K-NN algorithm has a weakness takes computing high because K-NN algorithm calculating the distance of all existing data. One solution to overcome these weaknesses is to do the clustering process before the classification process, because the clustering process does not require high computing time, clustering algorithm that can be used is Fuzzy C-Mean algorithm, the Fuzzy C-Mean algorithm does not need to be determined in first number of clusters to be formed, clusters that form on this algorithm will be formed naturally based datasets be entered, but the algorithm Fuzzy C-Mean has the disadvantage of clustering results obtained are often not the same even though the same input data, this is because the initial dataset that of the Fuzzy C-Mean is not optimal, to optimize initial datasets in this research using feature selection algorithm, after main feature of dataset selected the output from fuzzy C-Mean become consistent. Selection of the features is a method that is expected to provide an initial dataset that is optimum for the algorithm Fuzzy C-Means. Algorithms for feature selection in this study used are Principal Component Analysis (PCA). PCA reduced non significant attribute to created optimal dataset and can improve performance clustering and classification algorithm. Results in this study is an combining method of classification, clustering and feature extraction of data, these three methods successfully modeled to generate a data classification method of internet bandwidth usage that has high accuracy and have a fast performance.

查看原文本刊更多论文

基于K-NN改进互联网流量分类的聚类和特征选择技术

本研究将使用k -最近邻(K-NN)算法对互联网数据流量进行分类，K-NN适用于大量数据，可以产生更准确的分类，K-NN算法有一个缺点，因为K-NN算法计算所有现有数据的距离，计算量很高。克服这些缺点的一种解决方法是在分类过程之前做聚类过程，因为聚类过程不需要很高的计算时间，可以使用的聚类算法是模糊c -均值算法，模糊c -均值算法不需要在首先确定要形成的聚类数量，在此算法上形成的聚类自然会形成基于数据集的输入。但模糊C-Mean算法的缺点是即使输入相同的数据，得到的聚类结果往往也不相同，这是因为模糊C-Mean的初始数据集不是最优的，本研究使用特征选择算法来优化初始数据集，在数据集的主要特征选择后，从模糊C-Mean中输出的结果趋于一致。特征的选择是一种期望为模糊c均值算法提供最优初始数据集的方法。本研究中使用的特征选择算法是主成分分析(PCA)。PCA通过减少非显著属性来创建最优数据集，可以提高聚类和分类算法的性能。本研究的结果是将数据的分类、聚类和特征提取相结合，成功地对这三种方法进行建模，生成了一种准确率高、性能快的互联网带宽使用数据分类方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Advances in Computer Networks

自引率

0.00%

发文量