{"title":"K-modes and Entropy Cluster Centers Initialization Methods","authors":"Doaa S. Ali, Ayman Ghoneim, M. Saleh","doi":"10.5220/0006245504470454","DOIUrl":null,"url":null,"abstract":"Data clustering is an important unsupervised technique in data mining which aims to extract the natural partitions in a dataset without a priori class information. Unfortunately, every clustering model is very sensitive to the set of randomly initialized centers, since such initial clusters directly influence the formation of final clusters. Thus, determining the initial cluster centers is an important issue in clustering models. Previous work has shown that using multiple clustering validity indices in a multiobjective clustering model (e.g., MODEK-Modes model) yields more accurate results than using a single validity index. In this study, we enhance the performance of MODEK-Modes model by introducing two new initialization methods. The two proposed methods are the K-Modes initialization method and the entropy initialization method. The two proposed methods are tested using ten benchmark real life datasets obtained from the UCI Machine Learning Repository. Experimental results show that the two initialization methods achieve significant improvement in the clustering performance compared to other existing initialization methods.","PeriodicalId":235376,"journal":{"name":"International Conference on Operations Research and Enterprise Systems","volume":"58 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Operations Research and Enterprise Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0006245504470454","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Data clustering is an important unsupervised technique in data mining which aims to extract the natural partitions in a dataset without a priori class information. Unfortunately, every clustering model is very sensitive to the set of randomly initialized centers, since such initial clusters directly influence the formation of final clusters. Thus, determining the initial cluster centers is an important issue in clustering models. Previous work has shown that using multiple clustering validity indices in a multiobjective clustering model (e.g., MODEK-Modes model) yields more accurate results than using a single validity index. In this study, we enhance the performance of MODEK-Modes model by introducing two new initialization methods. The two proposed methods are the K-Modes initialization method and the entropy initialization method. The two proposed methods are tested using ten benchmark real life datasets obtained from the UCI Machine Learning Repository. Experimental results show that the two initialization methods achieve significant improvement in the clustering performance compared to other existing initialization methods.