{"title":"使用smoka的可伸缩集群","authors":"J. Kogan","doi":"10.1109/ICCTA.2007.114","DOIUrl":null,"url":null,"abstract":"The paper reports a multi-step clustering procedure equipped with a divergence (a distance like junction derived from a convex function). The first step of the procedure is a BIRCH like algorithm capable to convert very large datasets to \"summaries\" that require much less computer memory. The second step is the principal direction divisive partitioning algorithm (PDDP) that partitions the set of \"summaries\" into k clusters. This partition is the input for a smoothed k-means based clustering algorithm (smoka). The final partition of \"summaries\" generated by smoka induces a partition of the original dataset. Preliminary numerical experiments with text collections reported in the paper demonstrate smoka's remarkable accuracy and speed of convergence","PeriodicalId":308247,"journal":{"name":"2007 International Conference on Computing: Theory and Applications (ICCTA'07)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Scalable Clustering with smoka\",\"authors\":\"J. Kogan\",\"doi\":\"10.1109/ICCTA.2007.114\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper reports a multi-step clustering procedure equipped with a divergence (a distance like junction derived from a convex function). The first step of the procedure is a BIRCH like algorithm capable to convert very large datasets to \\\"summaries\\\" that require much less computer memory. The second step is the principal direction divisive partitioning algorithm (PDDP) that partitions the set of \\\"summaries\\\" into k clusters. This partition is the input for a smoothed k-means based clustering algorithm (smoka). The final partition of \\\"summaries\\\" generated by smoka induces a partition of the original dataset. Preliminary numerical experiments with text collections reported in the paper demonstrate smoka's remarkable accuracy and speed of convergence\",\"PeriodicalId\":308247,\"journal\":{\"name\":\"2007 International Conference on Computing: Theory and Applications (ICCTA'07)\",\"volume\":\"115 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-03-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 International Conference on Computing: Theory and Applications (ICCTA'07)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCTA.2007.114\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 International Conference on Computing: Theory and Applications (ICCTA'07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCTA.2007.114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The paper reports a multi-step clustering procedure equipped with a divergence (a distance like junction derived from a convex function). The first step of the procedure is a BIRCH like algorithm capable to convert very large datasets to "summaries" that require much less computer memory. The second step is the principal direction divisive partitioning algorithm (PDDP) that partitions the set of "summaries" into k clusters. This partition is the input for a smoothed k-means based clustering algorithm (smoka). The final partition of "summaries" generated by smoka induces a partition of the original dataset. Preliminary numerical experiments with text collections reported in the paper demonstrate smoka's remarkable accuracy and speed of convergence