查找XML文档之间的语法相似性

17th International Workshop on Database and Expert Systems Applications (DEXA'06) Pub Date : 2006-09-04 DOI:10.1109/DEXA.2006.62

Davood Rafiei, D. L. Moise, Dabo Sun

{"title":"查找XML文档之间的语法相似性","authors":"Davood Rafiei, D. L. Moise, Dabo Sun","doi":"10.1109/DEXA.2006.62","DOIUrl":null,"url":null,"abstract":"Detecting structural similarities between XML documents has been the subject of several recent work, and the proposed algorithms mostly use tree edit distance between the corresponding trees of XML documents. However, evaluating a tree edit distance is computationally expensive and does not easily scale up to large collections. We show in this paper that a tree edit distance computation often is not necessary and can be avoided. In particular, we propose a concise structural summary of XML documents and show that a comparison based on this summary is both fast and effective. Our experimental evaluation shows that this method does an excellent job of grouping documents generated by the same DTD, outperforming some of the previously proposed solutions based on a tree comparison. Furthermore, the time complexity of the algorithm is linear on the size of the structural description","PeriodicalId":282986,"journal":{"name":"17th International Workshop on Database and Expert Systems Applications (DEXA'06)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":"{\"title\":\"Finding Syntactic Similarities Between XML Documents\",\"authors\":\"Davood Rafiei, D. L. Moise, Dabo Sun\",\"doi\":\"10.1109/DEXA.2006.62\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Detecting structural similarities between XML documents has been the subject of several recent work, and the proposed algorithms mostly use tree edit distance between the corresponding trees of XML documents. However, evaluating a tree edit distance is computationally expensive and does not easily scale up to large collections. We show in this paper that a tree edit distance computation often is not necessary and can be avoided. In particular, we propose a concise structural summary of XML documents and show that a comparison based on this summary is both fast and effective. Our experimental evaluation shows that this method does an excellent job of grouping documents generated by the same DTD, outperforming some of the previously proposed solutions based on a tree comparison. Furthermore, the time complexity of the algorithm is linear on the size of the structural description\",\"PeriodicalId\":282986,\"journal\":{\"name\":\"17th International Workshop on Database and Expert Systems Applications (DEXA'06)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-09-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"37\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"17th International Workshop on Database and Expert Systems Applications (DEXA'06)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DEXA.2006.62\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"17th International Workshop on Database and Expert Systems Applications (DEXA'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEXA.2006.62","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 37

摘要

检测XML文档之间的结构相似性是最近几项工作的主题，所提出的算法主要使用XML文档的相应树之间的树编辑距离。然而，计算树编辑距离在计算上是昂贵的，并且不容易扩展到大型集合。本文表明，树编辑距离的计算通常是不必要的，并且是可以避免的。特别地，我们提出了XML文档的简明结构摘要，并证明了基于该摘要的比较既快速又有效。我们的实验评估表明，这种方法在对由相同DTD生成的文档进行分组方面做得很好，优于以前提出的一些基于树比较的解决方案。此外，该算法的时间复杂度与结构描述的大小呈线性关系

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Finding Syntactic Similarities Between XML Documents

Detecting structural similarities between XML documents has been the subject of several recent work, and the proposed algorithms mostly use tree edit distance between the corresponding trees of XML documents. However, evaluating a tree edit distance is computationally expensive and does not easily scale up to large collections. We show in this paper that a tree edit distance computation often is not necessary and can be avoided. In particular, we propose a concise structural summary of XML documents and show that a comparison based on this summary is both fast and effective. Our experimental evaluation shows that this method does an excellent job of grouping documents generated by the same DTD, outperforming some of the previously proposed solutions based on a tree comparison. Furthermore, the time complexity of the algorithm is linear on the size of the structural description

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

17th International Workshop on Database and Expert Systems Applications (DEXA'06)

自引率

0.00%

发文量