Wuning Tong, Yuping Wang, Junkun Zhong, Weipeng P. Yan
{"title":"A New Weight Based Density Peaks Clustering Algorithm for Numerical and Categorical Data","authors":"Wuning Tong, Yuping Wang, Junkun Zhong, Weipeng P. Yan","doi":"10.1109/CIS.2017.00044","DOIUrl":null,"url":null,"abstract":"Discovering the potential group structure of objects is of crucial importance to data mining. Most of the existing clustering approaches are applicable only to purely numerical or categorical data, and only a few approaches can deal with both numerical and categorical attributes recently, however, these approaches often need higher computational cost. To cluster data with both numerical and categorical attributes efficiently, in this paper, we propose a new approach with the following schemes. First, a measure of the importance of each categorical attribute is designed and a method to generate the weight of each categorical attribute is proposed based on this measure. Then a unified distance metric is proposed by combining the distance for the numerical part and that for the categorical part with weights. Furthermore, combining the new weights into method in [1], an improved density peaks clustering algorithm is presented. Finally, the experimental results show the efficiency of the proposed approach.","PeriodicalId":304958,"journal":{"name":"2017 13th International Conference on Computational Intelligence and Security (CIS)","volume":"33 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 13th International Conference on Computational Intelligence and Security (CIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIS.2017.00044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Discovering the potential group structure of objects is of crucial importance to data mining. Most of the existing clustering approaches are applicable only to purely numerical or categorical data, and only a few approaches can deal with both numerical and categorical attributes recently, however, these approaches often need higher computational cost. To cluster data with both numerical and categorical attributes efficiently, in this paper, we propose a new approach with the following schemes. First, a measure of the importance of each categorical attribute is designed and a method to generate the weight of each categorical attribute is proposed based on this measure. Then a unified distance metric is proposed by combining the distance for the numerical part and that for the categorical part with weights. Furthermore, combining the new weights into method in [1], an improved density peaks clustering algorithm is presented. Finally, the experimental results show the efficiency of the proposed approach.