Shinwon Lee, Wonhee Lee, Sung-Jong Chung, D. An, Ingeun Bok, Hongjin Ryu
{"title":"Selection of Cluster Hierarchy Depth in Hierarchical Clustering Using K-Means Algorithm","authors":"Shinwon Lee, Wonhee Lee, Sung-Jong Chung, D. An, Ingeun Bok, Hongjin Ryu","doi":"10.1109/ISITC.2007.84","DOIUrl":null,"url":null,"abstract":"Many papers have shown that the hierarchical clustering method takes good-performance, but is limited because of its quadratic time complexity. In contrast, with a large number of variables, K-means has a time complexity that is linear in the number of documents, but is thought to produce inferior clusters. Think of the factor of simplify, high-quality and high-efficiency, we combine the two approaches providing a new system named CONDOR system with hierarchical structure based on document clustering using K-means algorithm. Evaluated the performance on different hierarchy depth and initial uncertain centroid number based on variational relative document amount correspond to given queries. Comparing with regular method that the initial centroids have been established in advance, our method performance has been improved a lot.","PeriodicalId":394071,"journal":{"name":"2007 International Symposium on Information Technology Convergence (ISITC 2007)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 International Symposium on Information Technology Convergence (ISITC 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISITC.2007.84","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Many papers have shown that the hierarchical clustering method takes good-performance, but is limited because of its quadratic time complexity. In contrast, with a large number of variables, K-means has a time complexity that is linear in the number of documents, but is thought to produce inferior clusters. Think of the factor of simplify, high-quality and high-efficiency, we combine the two approaches providing a new system named CONDOR system with hierarchical structure based on document clustering using K-means algorithm. Evaluated the performance on different hierarchy depth and initial uncertain centroid number based on variational relative document amount correspond to given queries. Comparing with regular method that the initial centroids have been established in advance, our method performance has been improved a lot.