{"title":"一种基于中位数树的前缀编码数据流聚类算法","authors":"Guangsheng Feng, Huiqiang Wang, Qian Zhao, Ying Liang","doi":"10.1109/ICICSE.2008.103","DOIUrl":null,"url":null,"abstract":"In actual data streams, there are lots of prefix-coded data, which widely existed in applications. What leads to non-ideal performance and clustering result is that the special treatment of these prefix-coded data structure is not considered in traditional clustering algorithm. To deal with this problem, a new concept of median-tree as well as a method of calculating the coding distance is proposed in this paper. Based upon this, a simple algorithm-dfCluster is put forward, which is capable of dealing with the prefix-coded data streams efficiently. Also, the algorithm analysis is presented in depth. At last, the designed experiment demonstrates that dfCluster is more efficient than the naive algorithm to cluster those kinds of data streams, and meanwhile, the performance of our algorithm is not limited by the specified value of k just as in algorithm k-means.","PeriodicalId":333889,"journal":{"name":"2008 International Conference on Internet Computing in Science and Engineering","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Clustering Algorithm for Prefix-Coded Data Stream Based upon Median-Tree\",\"authors\":\"Guangsheng Feng, Huiqiang Wang, Qian Zhao, Ying Liang\",\"doi\":\"10.1109/ICICSE.2008.103\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In actual data streams, there are lots of prefix-coded data, which widely existed in applications. What leads to non-ideal performance and clustering result is that the special treatment of these prefix-coded data structure is not considered in traditional clustering algorithm. To deal with this problem, a new concept of median-tree as well as a method of calculating the coding distance is proposed in this paper. Based upon this, a simple algorithm-dfCluster is put forward, which is capable of dealing with the prefix-coded data streams efficiently. Also, the algorithm analysis is presented in depth. At last, the designed experiment demonstrates that dfCluster is more efficient than the naive algorithm to cluster those kinds of data streams, and meanwhile, the performance of our algorithm is not limited by the specified value of k just as in algorithm k-means.\",\"PeriodicalId\":333889,\"journal\":{\"name\":\"2008 International Conference on Internet Computing in Science and Engineering\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-01-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 International Conference on Internet Computing in Science and Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICSE.2008.103\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Conference on Internet Computing in Science and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICSE.2008.103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Novel Clustering Algorithm for Prefix-Coded Data Stream Based upon Median-Tree
In actual data streams, there are lots of prefix-coded data, which widely existed in applications. What leads to non-ideal performance and clustering result is that the special treatment of these prefix-coded data structure is not considered in traditional clustering algorithm. To deal with this problem, a new concept of median-tree as well as a method of calculating the coding distance is proposed in this paper. Based upon this, a simple algorithm-dfCluster is put forward, which is capable of dealing with the prefix-coded data streams efficiently. Also, the algorithm analysis is presented in depth. At last, the designed experiment demonstrates that dfCluster is more efficient than the naive algorithm to cluster those kinds of data streams, and meanwhile, the performance of our algorithm is not limited by the specified value of k just as in algorithm k-means.