基于最大频繁项集的分类数据聚类

Sixth International Conference on Machine Learning and Applications (ICMLA 2007) Pub Date : 2007-12-13 DOI:10.1109/ICMLA.2007.11

Dadong Yu, Dongbo Liu, Rui Luo, Jianxin Wang

{"title":"基于最大频繁项集的分类数据聚类","authors":"Dadong Yu, Dongbo Liu, Rui Luo, Jianxin Wang","doi":"10.1109/ICMLA.2007.11","DOIUrl":null,"url":null,"abstract":"Clustering categorical data received more attention since recent years, but several aspects of the existing algorithms, such as the interpretabilities of found clusters, the impact of data selection orders, are not well solved. A novel categorical data clustering algorithm called CLUBMIS is proposed in this paper, which can effectively find the interesting clusters. In addition, the clusters can be easily interpreted by the maximal frequent itemsets used in the clustering process. Different from most of the hierarchical clustering algorithm, CLUBMIS clusters datasets based on the summarized information, i.e. maximal frequent itemsets, thus it eliminates the effect of different data selection order.","PeriodicalId":448863,"journal":{"name":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Clustering Categorical Data Based on Maximal Frequent Itemsets\",\"authors\":\"Dadong Yu, Dongbo Liu, Rui Luo, Jianxin Wang\",\"doi\":\"10.1109/ICMLA.2007.11\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering categorical data received more attention since recent years, but several aspects of the existing algorithms, such as the interpretabilities of found clusters, the impact of data selection orders, are not well solved. A novel categorical data clustering algorithm called CLUBMIS is proposed in this paper, which can effectively find the interesting clusters. In addition, the clusters can be easily interpreted by the maximal frequent itemsets used in the clustering process. Different from most of the hierarchical clustering algorithm, CLUBMIS clusters datasets based on the summarized information, i.e. maximal frequent itemsets, thus it eliminates the effect of different data selection order.\",\"PeriodicalId\":448863,\"journal\":{\"name\":\"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2007.11\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sixth International Conference on Machine Learning and Applications (ICMLA 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2007.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

近年来，分类数据的聚类越来越受到人们的关注，但现有算法的一些问题，如聚类的可解释性、数据选择顺序的影响等，并没有得到很好的解决。本文提出了一种新的分类数据聚类算法CLUBMIS，该算法可以有效地发现感兴趣的聚类。此外，聚类过程中使用的最大频繁项集可以很容易地解释聚类。与大多数分层聚类算法不同，CLUBMIS基于汇总信息即最大频繁项集对数据集进行聚类，从而消除了不同数据选择顺序的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Clustering Categorical Data Based on Maximal Frequent Itemsets

Clustering categorical data received more attention since recent years, but several aspects of the existing algorithms, such as the interpretabilities of found clusters, the impact of data selection orders, are not well solved. A novel categorical data clustering algorithm called CLUBMIS is proposed in this paper, which can effectively find the interesting clusters. In addition, the clusters can be easily interpreted by the maximal frequent itemsets used in the clustering process. Different from most of the hierarchical clustering algorithm, CLUBMIS clusters datasets based on the summarized information, i.e. maximal frequent itemsets, thus it eliminates the effect of different data selection order.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Sixth International Conference on Machine Learning and Applications (ICMLA 2007)

自引率

0.00%

发文量