Comparative Analysis of Hybrid Clustering Algorithm on Different Dataset

2018 8th International Conference on Electronics Information and Emergency Communication (ICEIEC) Pub Date : 2018-06-01 DOI:10.1109/ICEIEC.2018.8473568

H. Malik, N. Laghari, D. Sangrasi, Z. Dayo

引用次数: 9

Abstract

Clustering is a data mining technique, in which data is grouped based on similarity and dissimilarity. Clustering is usually used to identify hidden pattern in multidimensional complex data and, these hidden pattern provide bases for making decisions. The objective of this research to find the best clustering algorithm. K-Mean is a famous clustering algorithm, which is simple and easy to implement, but the drawback of K-Mean is that, it does not work with higher dimensional data, for aiding with this drawback K-Mean is fused with other clustering algorithms such as PSO (Particle Swarm optimization) and PCA (Principle Component Analysis) for better results and cluster identifications. In this paper authors used hybrid clustering approach (K-Mean, PSO-K-Mean and PCA-K-Mean) to improve the clustering result based on parameter (Purity, Rand index and Computation Time) on different data sets taken from UCI Repository.

查看原文本刊更多论文

混合聚类算法在不同数据集上的比较分析

聚类是一种数据挖掘技术，它将数据根据相似度和不相似度进行分组。聚类通常用于识别多维复杂数据中的隐藏模式，这些隐藏模式为决策提供依据。本研究的目的是寻找最佳的聚类算法。K-Mean是一种著名的聚类算法，它简单且易于实现，但K-Mean的缺点是它不适用于高维数据，为了弥补这一缺点，K-Mean与其他聚类算法如PSO (Particle Swarm optimization)和PCA (principal Component Analysis)相融合，以获得更好的结果和聚类识别。本文采用混合聚类方法(K-Mean、PSO-K-Mean和PCA-K-Mean)对UCI Repository中不同数据集基于参数(纯度、Rand指数和计算时间)的聚类结果进行改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 8th International Conference on Electronics Information and Emergency Communication (ICEIEC)

自引率

0.00%

发文量