{"title":"分类数据聚类的一种有效的两阶段分层算法","authors":"Xuedong Gao, Minghan Yang, Guiying Wei","doi":"10.1109/LISS.2018.8593256","DOIUrl":null,"url":null,"abstract":"The pursuit of both quality and efficiency in the clustering analysis is a long-existed paradox. In real-world applications, a controllable method of the quality-efficiency trade-off might be more practical. The hierarchical algorithms usually perform better on the clustering quality but are much more computationally expensive than partitioning algorithms. In this paper, we proposed an efficient two-stage hierarchical algorithm for categorical data clustering (THUS) to improve the efficiency while maintaining acceptable quality. In the first stage, several efficient methods are used to generate intermediate clusters to reduce the complexity of the hierarchical stage two. Experimental results show that the proposed algorithm reduces the computational time considerably, and the clustering quality can be equivalent to the original hierarchical algorithm. By manipulating the pre-clustering level, a controllable trade-off between clustering quality and efficiency can be conducted based on application purpose.","PeriodicalId":338998,"journal":{"name":"2018 8th International Conference on Logistics, Informatics and Service Sciences (LISS)","volume":"30 9","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"THUS: An Efficient Two-stage Hierarchical Algorithm for Categorical Data Clustering\",\"authors\":\"Xuedong Gao, Minghan Yang, Guiying Wei\",\"doi\":\"10.1109/LISS.2018.8593256\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The pursuit of both quality and efficiency in the clustering analysis is a long-existed paradox. In real-world applications, a controllable method of the quality-efficiency trade-off might be more practical. The hierarchical algorithms usually perform better on the clustering quality but are much more computationally expensive than partitioning algorithms. In this paper, we proposed an efficient two-stage hierarchical algorithm for categorical data clustering (THUS) to improve the efficiency while maintaining acceptable quality. In the first stage, several efficient methods are used to generate intermediate clusters to reduce the complexity of the hierarchical stage two. Experimental results show that the proposed algorithm reduces the computational time considerably, and the clustering quality can be equivalent to the original hierarchical algorithm. By manipulating the pre-clustering level, a controllable trade-off between clustering quality and efficiency can be conducted based on application purpose.\",\"PeriodicalId\":338998,\"journal\":{\"name\":\"2018 8th International Conference on Logistics, Informatics and Service Sciences (LISS)\",\"volume\":\"30 9\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 8th International Conference on Logistics, Informatics and Service Sciences (LISS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/LISS.2018.8593256\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 8th International Conference on Logistics, Informatics and Service Sciences (LISS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/LISS.2018.8593256","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
THUS: An Efficient Two-stage Hierarchical Algorithm for Categorical Data Clustering
The pursuit of both quality and efficiency in the clustering analysis is a long-existed paradox. In real-world applications, a controllable method of the quality-efficiency trade-off might be more practical. The hierarchical algorithms usually perform better on the clustering quality but are much more computationally expensive than partitioning algorithms. In this paper, we proposed an efficient two-stage hierarchical algorithm for categorical data clustering (THUS) to improve the efficiency while maintaining acceptable quality. In the first stage, several efficient methods are used to generate intermediate clusters to reduce the complexity of the hierarchical stage two. Experimental results show that the proposed algorithm reduces the computational time considerably, and the clustering quality can be equivalent to the original hierarchical algorithm. By manipulating the pre-clustering level, a controllable trade-off between clustering quality and efficiency can be conducted based on application purpose.