HybridTreeMiner:一个使用规范形式挖掘频繁根树和自由树的高效算法

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI:10.1109/SSDBM.2004.41

Yun Chi, Yirong Yang, R. Muntz

{"title":"HybridTreeMiner:一个使用规范形式挖掘频繁根树和自由树的高效算法","authors":"Yun Chi, Yirong Yang, R. Muntz","doi":"10.1109/SSDBM.2004.41","DOIUrl":null,"url":null,"abstract":"Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. In this paper, we present HybridTreeMiner, a computationally efficient algorithm that discovers all frequently occurring subtrees in a database of rooted unordered trees. The algorithm mines frequent subtrees by traversing an enumeration tree that systematically enumerates all subtrees. The enumeration tree is defined based on a novel canonical form for rooted unordered trees - the breadth-first canonical form (BFCF). By extending the definitions of our canonical form and enumeration tree to free trees, our algorithm can efficiently handle databases of free trees as well. We study the performance of our algorithms through extensive experiments based on both synthetic data and datasets from real applications. The experiments show that our algorithm is competitive in comparison to known rooted tree mining algorithms and is faster by one to two orders of magnitudes compared to a known algorithm for mining frequent free trees.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"146","resultStr":"{\"title\":\"HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms\",\"authors\":\"Yun Chi, Yirong Yang, R. Muntz\",\"doi\":\"10.1109/SSDBM.2004.41\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. In this paper, we present HybridTreeMiner, a computationally efficient algorithm that discovers all frequently occurring subtrees in a database of rooted unordered trees. The algorithm mines frequent subtrees by traversing an enumeration tree that systematically enumerates all subtrees. The enumeration tree is defined based on a novel canonical form for rooted unordered trees - the breadth-first canonical form (BFCF). By extending the definitions of our canonical form and enumeration tree to free trees, our algorithm can efficiently handle databases of free trees as well. We study the performance of our algorithms through extensive experiments based on both synthetic data and datasets from real applications. The experiments show that our algorithm is competitive in comparison to known rooted tree mining algorithms and is faster by one to two orders of magnitudes compared to a known algorithm for mining frequent free trees.\",\"PeriodicalId\":383615,\"journal\":{\"name\":\"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.\",\"volume\":\"2015 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"146\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SSDBM.2004.41\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSDBM.2004.41","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 146

摘要

树形结构广泛应用于计算生物学、模式识别、XML数据库、计算机网络等领域。在本文中，我们提出了HybridTreeMiner，这是一个计算效率很高的算法，可以发现有根无序树数据库中所有频繁出现的子树。该算法通过遍历一个系统地枚举所有子树的枚举树来挖掘频繁子树。枚举树是基于有根无序树的一种新的规范形式——宽度优先规范形式(BFCF)来定义的。通过将规范形式和枚举树的定义扩展到自由树，我们的算法也可以有效地处理自由树的数据库。我们通过基于合成数据和实际应用数据集的大量实验来研究算法的性能。实验表明，与已知的有根树挖掘算法相比，我们的算法具有竞争力，并且在挖掘频繁自由树时，与已知算法相比，速度要快一到两个数量级。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms

Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. In this paper, we present HybridTreeMiner, a computationally efficient algorithm that discovers all frequently occurring subtrees in a database of rooted unordered trees. The algorithm mines frequent subtrees by traversing an enumeration tree that systematically enumerates all subtrees. The enumeration tree is defined based on a novel canonical form for rooted unordered trees - the breadth-first canonical form (BFCF). By extending the definitions of our canonical form and enumeration tree to free trees, our algorithm can efficiently handle databases of free trees as well. We study the performance of our algorithms through extensive experiments based on both synthetic data and datasets from real applications. The experiments show that our algorithm is competitive in comparison to known rooted tree mining algorithms and is faster by one to two orders of magnitudes compared to a known algorithm for mining frequent free trees.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.

自引率

0.00%

发文量