一种使用不规则大小细胞图的聚类方法

15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA'05) Pub Date : 2005-04-03 DOI:10.1109/RIDE.2005.5

Tomotake Nakamura, Y. Kamidoi, S. Wakabayashi, N. Yoshida

{"title":"一种使用不规则大小细胞图的聚类方法","authors":"Tomotake Nakamura, Y. Kamidoi, S. Wakabayashi, N. Yoshida","doi":"10.1109/RIDE.2005.5","DOIUrl":null,"url":null,"abstract":"In this paper we propose a clustering method (data mining technique) called \"FlexDice\" for large high-dimensional datasets. The data structure used in FlexDice is a graph-structure. Its data structure and the data structure of Quadtree have a few same features, but they have some crucial differences. The most crucial difference is that the data structure of Quadtree is a tree-structure while the data structure used in FlexDice is a graph-structure. In this paper we show the differences between these structures. Quadtree is a tree-structure, and the algorithm constructing it forms cells hierarchically by dividing data object spaces in a top-down manner. That is why traversing operations from the root of the tree to each of its leaves is necessary in such methods of searching for or indexing of data objects. In contrast to the case of Quadtree, no tree-structure is required in the algorithm FlexDice, because such traversing operations are unnecessary. However in the clustering method, relevant cells which include each of the similar data objects must be merged, instead of choosing a hyper-dividing plane. Hence, FlexDice creates neighboring links among relevant cells in every layer after dividing cells, and merges such cells including similar data objects. To reduce memory usage, FlexDice dynamically removes worthless cells, and maintains only worthwhile cells including data objects and parent cells needed for creating neighboring links of worthwhile cells. After neighboring links among worthwhile cells are created, these parent cells needed for creating neighboring links of worthwhile cells are removed from memory. In this paper we present dissimilarity between the data structure used in FlexDice and the structure of Quadtree, and show that the data structure used in FlexDice is suitable for clustering.","PeriodicalId":404914,"journal":{"name":"15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA'05)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A clustering method using an irregular size cell graph\",\"authors\":\"Tomotake Nakamura, Y. Kamidoi, S. Wakabayashi, N. Yoshida\",\"doi\":\"10.1109/RIDE.2005.5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we propose a clustering method (data mining technique) called \\\"FlexDice\\\" for large high-dimensional datasets. The data structure used in FlexDice is a graph-structure. Its data structure and the data structure of Quadtree have a few same features, but they have some crucial differences. The most crucial difference is that the data structure of Quadtree is a tree-structure while the data structure used in FlexDice is a graph-structure. In this paper we show the differences between these structures. Quadtree is a tree-structure, and the algorithm constructing it forms cells hierarchically by dividing data object spaces in a top-down manner. That is why traversing operations from the root of the tree to each of its leaves is necessary in such methods of searching for or indexing of data objects. In contrast to the case of Quadtree, no tree-structure is required in the algorithm FlexDice, because such traversing operations are unnecessary. However in the clustering method, relevant cells which include each of the similar data objects must be merged, instead of choosing a hyper-dividing plane. Hence, FlexDice creates neighboring links among relevant cells in every layer after dividing cells, and merges such cells including similar data objects. To reduce memory usage, FlexDice dynamically removes worthless cells, and maintains only worthwhile cells including data objects and parent cells needed for creating neighboring links of worthwhile cells. After neighboring links among worthwhile cells are created, these parent cells needed for creating neighboring links of worthwhile cells are removed from memory. In this paper we present dissimilarity between the data structure used in FlexDice and the structure of Quadtree, and show that the data structure used in FlexDice is suitable for clustering.\",\"PeriodicalId\":404914,\"journal\":{\"name\":\"15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA'05)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA'05)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RIDE.2005.5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RIDE.2005.5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在本文中，我们提出了一种名为“FlexDice”的聚类方法(数据挖掘技术)，用于大型高维数据集。FlexDice使用的数据结构是图形结构。它的数据结构与四叉树的数据结构有一些相同的特征，但它们有一些关键的区别。最关键的区别是，Quadtree的数据结构是树结构，而FlexDice使用的数据结构是图结构。在本文中，我们展示了这些结构之间的差异。四叉树是一种树状结构，构造四叉树的算法采用自顶向下的方式划分数据对象空间，分层次形成单元。这就是为什么在这种搜索或索引数据对象的方法中，必须从树的根遍历操作到它的每个叶子。与四叉树的情况相反，FlexDice算法不需要树结构，因为这样的遍历操作是不必要的。然而，在聚类方法中，必须合并包含每个相似数据对象的相关单元，而不是选择超分割平面。因此，在细胞分裂后，FlexDice在每一层的相关细胞之间创建相邻的链接，并合并包含相似数据对象的细胞。为了减少内存使用，FlexDice动态地删除无用的单元格，并只维护有价值的单元格，包括数据对象和创建有价值单元格相邻链接所需的父单元格。在有价值的单元之间创建相邻链接之后，这些用于创建有价值的单元的相邻链接所需的父单元将从内存中删除。本文提出了FlexDice中使用的数据结构与四叉树结构的不同之处，并证明了FlexDice中使用的数据结构适合聚类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A clustering method using an irregular size cell graph

In this paper we propose a clustering method (data mining technique) called "FlexDice" for large high-dimensional datasets. The data structure used in FlexDice is a graph-structure. Its data structure and the data structure of Quadtree have a few same features, but they have some crucial differences. The most crucial difference is that the data structure of Quadtree is a tree-structure while the data structure used in FlexDice is a graph-structure. In this paper we show the differences between these structures. Quadtree is a tree-structure, and the algorithm constructing it forms cells hierarchically by dividing data object spaces in a top-down manner. That is why traversing operations from the root of the tree to each of its leaves is necessary in such methods of searching for or indexing of data objects. In contrast to the case of Quadtree, no tree-structure is required in the algorithm FlexDice, because such traversing operations are unnecessary. However in the clustering method, relevant cells which include each of the similar data objects must be merged, instead of choosing a hyper-dividing plane. Hence, FlexDice creates neighboring links among relevant cells in every layer after dividing cells, and merges such cells including similar data objects. To reduce memory usage, FlexDice dynamically removes worthless cells, and maintains only worthwhile cells including data objects and parent cells needed for creating neighboring links of worthwhile cells. After neighboring links among worthwhile cells are created, these parent cells needed for creating neighboring links of worthwhile cells are removed from memory. In this paper we present dissimilarity between the data structure used in FlexDice and the structure of Quadtree, and show that the data structure used in FlexDice is suitable for clustering.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA'05)

自引率

0.00%

发文量