Clustering and Feature Selection Technique for Improving Internet Traffic Classification Using K-NN

Trianggoro Wiradinata, A. Paramita
{"title":"Clustering and Feature Selection Technique for Improving Internet Traffic Classification Using K-NN","authors":"Trianggoro Wiradinata, A. Paramita","doi":"10.18178/JACN.2016.4.1.198","DOIUrl":null,"url":null,"abstract":"This research will use the algorithm K-Nearest Neighbour (K-NN) to classify internet data traffic, K-NN is suitable for large amounts of data and can produce a more accurate classification, K-NN algorithm has a weakness takes computing high because K-NN algorithm calculating the distance of all existing data. One solution to overcome these weaknesses is to do the clustering process before the classification process, because the clustering process does not require high computing time, clustering algorithm that can be used is Fuzzy C-Mean algorithm, the Fuzzy C-Mean algorithm does not need to be determined in first number of clusters to be formed, clusters that form on this algorithm will be formed naturally based datasets be entered, but the algorithm Fuzzy C-Mean has the disadvantage of clustering results obtained are often not the same even though the same input data, this is because the initial dataset that of the Fuzzy C-Mean is not optimal, to optimize initial datasets in this research using feature selection algorithm, after main feature of dataset selected the output from fuzzy C-Mean become consistent. Selection of the features is a method that is expected to provide an initial dataset that is optimum for the algorithm Fuzzy C-Means. Algorithms for feature selection in this study used are Principal Component Analysis (PCA). PCA reduced non significant attribute to created optimal dataset and can improve performance clustering and classification algorithm. Results in this study is an combining method of classification, clustering and feature extraction of data, these three methods successfully modeled to generate a data classification method of internet bandwidth usage that has high accuracy and have a fast performance.","PeriodicalId":232851,"journal":{"name":"Journal of Advances in Computer Networks","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Advances in Computer Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18178/JACN.2016.4.1.198","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

This research will use the algorithm K-Nearest Neighbour (K-NN) to classify internet data traffic, K-NN is suitable for large amounts of data and can produce a more accurate classification, K-NN algorithm has a weakness takes computing high because K-NN algorithm calculating the distance of all existing data. One solution to overcome these weaknesses is to do the clustering process before the classification process, because the clustering process does not require high computing time, clustering algorithm that can be used is Fuzzy C-Mean algorithm, the Fuzzy C-Mean algorithm does not need to be determined in first number of clusters to be formed, clusters that form on this algorithm will be formed naturally based datasets be entered, but the algorithm Fuzzy C-Mean has the disadvantage of clustering results obtained are often not the same even though the same input data, this is because the initial dataset that of the Fuzzy C-Mean is not optimal, to optimize initial datasets in this research using feature selection algorithm, after main feature of dataset selected the output from fuzzy C-Mean become consistent. Selection of the features is a method that is expected to provide an initial dataset that is optimum for the algorithm Fuzzy C-Means. Algorithms for feature selection in this study used are Principal Component Analysis (PCA). PCA reduced non significant attribute to created optimal dataset and can improve performance clustering and classification algorithm. Results in this study is an combining method of classification, clustering and feature extraction of data, these three methods successfully modeled to generate a data classification method of internet bandwidth usage that has high accuracy and have a fast performance.
基于K-NN改进互联网流量分类的聚类和特征选择技术
本研究将使用k -最近邻(K-NN)算法对互联网数据流量进行分类,K-NN适用于大量数据,可以产生更准确的分类,K-NN算法有一个缺点,因为K-NN算法计算所有现有数据的距离,计算量很高。克服这些缺点的一种解决方法是在分类过程之前做聚类过程,因为聚类过程不需要很高的计算时间,可以使用的聚类算法是模糊c -均值算法,模糊c -均值算法不需要在首先确定要形成的聚类数量,在此算法上形成的聚类自然会形成基于数据集的输入。但模糊C-Mean算法的缺点是即使输入相同的数据,得到的聚类结果往往也不相同,这是因为模糊C-Mean的初始数据集不是最优的,本研究使用特征选择算法来优化初始数据集,在数据集的主要特征选择后,从模糊C-Mean中输出的结果趋于一致。特征的选择是一种期望为模糊c均值算法提供最优初始数据集的方法。本研究中使用的特征选择算法是主成分分析(PCA)。PCA通过减少非显著属性来创建最优数据集,可以提高聚类和分类算法的性能。本研究的结果是将数据的分类、聚类和特征提取相结合,成功地对这三种方法进行建模,生成了一种准确率高、性能快的互联网带宽使用数据分类方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信