{"title":"Mining frequent rooted subtrees in XML data with Me-Tree","authors":"Wan-Song Zhang, Daxin Liu, Jianpei Zhang","doi":"10.1109/SIEDS.2004.239908","DOIUrl":null,"url":null,"abstract":"Due to the rapid progress of network and storage technologies, a huge amount of electronic data such as Web pages and XML data has been available on Internet. These weekly-structured documents have no rigid structures, and often called semistructured data. Hence, there have been increasing demands for efficient methods for discovering patterns in large collection of semistructured data. We study a data mining problem of discovering frequent subtrees in a large collection of XML data, where both of the patterns and the data are modeled by labeled ordered trees. We present an efficient algorithm RSTMiner that computes all rooted subtrees appearing in a collection of XML trees with frequent above a user-specified threshold using a special structure Me-tree. In this algorithm, Me-tree is used as a merging tree to supply scheme information for efficient pruning and mining frequent subtrees. The keys of the algorithm are efficient pruning candidates with Me-Tree structure and incrementally enumerating all rooted subtrees in canonical form based on a extended right most expansion technique","PeriodicalId":287496,"journal":{"name":"Proceedings of the 2004 IEEE Systems and Information Engineering Design Symposium, 2004.","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2004 IEEE Systems and Information Engineering Design Symposium, 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIEDS.2004.239908","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Due to the rapid progress of network and storage technologies, a huge amount of electronic data such as Web pages and XML data has been available on Internet. These weekly-structured documents have no rigid structures, and often called semistructured data. Hence, there have been increasing demands for efficient methods for discovering patterns in large collection of semistructured data. We study a data mining problem of discovering frequent subtrees in a large collection of XML data, where both of the patterns and the data are modeled by labeled ordered trees. We present an efficient algorithm RSTMiner that computes all rooted subtrees appearing in a collection of XML trees with frequent above a user-specified threshold using a special structure Me-tree. In this algorithm, Me-tree is used as a merging tree to supply scheme information for efficient pruning and mining frequent subtrees. The keys of the algorithm are efficient pruning candidates with Me-Tree structure and incrementally enumerating all rooted subtrees in canonical form based on a extended right most expansion technique