{"title":"基于最大频繁项集的分类数据聚类","authors":"Dadong Yu, Dongbo Liu, Rui Luo, Jianxin Wang","doi":"10.1109/ICMLA.2007.11","DOIUrl":null,"url":null,"abstract":"Clustering categorical data received more attention since recent years, but several aspects of the existing algorithms, such as the interpretabilities of found clusters, the impact of data selection orders, are not well solved. A novel categorical data clustering algorithm called CLUBMIS is proposed in this paper, which can effectively find the interesting clusters. In addition, the clusters can be easily interpreted by the maximal frequent itemsets used in the clustering process. Different from most of the hierarchical clustering algorithm, CLUBMIS clusters datasets based on the summarized information, i.e. maximal frequent itemsets, thus it eliminates the effect of different data selection order.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Clustering Categorical Data Based on Maximal Frequent Itemsets\",\"authors\":\"Dadong Yu, Dongbo Liu, Rui Luo, Jianxin Wang\",\"doi\":\"10.1109/ICMLA.2007.11\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering categorical data received more attention since recent years, but several aspects of the existing algorithms, such as the interpretabilities of found clusters, the impact of data selection orders, are not well solved. A novel categorical data clustering algorithm called CLUBMIS is proposed in this paper, which can effectively find the interesting clusters. In addition, the clusters can be easily interpreted by the maximal frequent itemsets used in the clustering process. Different from most of the hierarchical clustering algorithm, CLUBMIS clusters datasets based on the summarized information, i.e. maximal frequent itemsets, thus it eliminates the effect of different data selection order.\",\"PeriodicalId\":448863,\"journal\":{\"name\":\"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2007.11\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2007.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Clustering Categorical Data Based on Maximal Frequent Itemsets
Clustering categorical data received more attention since recent years, but several aspects of the existing algorithms, such as the interpretabilities of found clusters, the impact of data selection orders, are not well solved. A novel categorical data clustering algorithm called CLUBMIS is proposed in this paper, which can effectively find the interesting clusters. In addition, the clusters can be easily interpreted by the maximal frequent itemsets used in the clustering process. Different from most of the hierarchical clustering algorithm, CLUBMIS clusters datasets based on the summarized information, i.e. maximal frequent itemsets, thus it eliminates the effect of different data selection order.