XML Duplicate Detection with Improved network pruning algorithm

V. Borate, Sudipta Giri
{"title":"XML Duplicate Detection with Improved network pruning algorithm","authors":"V. Borate, Sudipta Giri","doi":"10.1109/PERVASIVE.2015.7087007","DOIUrl":null,"url":null,"abstract":"Duplicate Detection is critical task of any database of any organization. Duplicates are nothing but the same real time entities or objects are presented in the form of different structure and in the different formats. We can find out the duplicates in relational data, in complex data and hierarchical data like XML. There are lots of works already presented in the past for finding the duplicates in the relational data. But nowadays there is more focus on finding duplicates in the XML data. Because of XML is very popular for data storing and extensively used for data exchange between the organizations. Here we have done an extensive literature survey on this topic and proposed a duplicate detection method that incorporates some of the existing paper's ideas and some of our original ideas. In addition to improving the efficiency and effectiveness, we also checks for its typographical errors when comparing the two XML elements. To test the correctness of Improved network pruning method, we are comparing it with existing duplicate detection system, and giving more focus on how we get higher precision and recall values in the various datasets we would used.","PeriodicalId":442000,"journal":{"name":"2015 International Conference on Pervasive Computing (ICPC)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Pervasive Computing (ICPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PERVASIVE.2015.7087007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Duplicate Detection is critical task of any database of any organization. Duplicates are nothing but the same real time entities or objects are presented in the form of different structure and in the different formats. We can find out the duplicates in relational data, in complex data and hierarchical data like XML. There are lots of works already presented in the past for finding the duplicates in the relational data. But nowadays there is more focus on finding duplicates in the XML data. Because of XML is very popular for data storing and extensively used for data exchange between the organizations. Here we have done an extensive literature survey on this topic and proposed a duplicate detection method that incorporates some of the existing paper's ideas and some of our original ideas. In addition to improving the efficiency and effectiveness, we also checks for its typographical errors when comparing the two XML elements. To test the correctness of Improved network pruning method, we are comparing it with existing duplicate detection system, and giving more focus on how we get higher precision and recall values in the various datasets we would used.
改进的网络修剪算法的XML重复检测
重复检测是任何组织的任何数据库的关键任务。副本只不过是相同的实时实体或对象以不同的结构和格式的形式呈现。我们可以在关系数据、复杂数据和分层数据(如XML)中发现重复。过去已经提出了许多查找关系数据中的重复项的工作。但是现在更多的关注于查找XML数据中的重复项。因为XML是非常流行的数据存储和广泛用于数据交换的组织之间。在这里,我们对这一主题进行了广泛的文献调查,并提出了一种重复检测方法,该方法结合了现有论文的一些想法和我们的一些原始想法。除了提高效率和有效性之外,我们还在比较两个XML元素时检查其排版错误。为了测试改进的网络修剪方法的正确性,我们将其与现有的重复检测系统进行了比较,并更加关注如何在我们使用的各种数据集中获得更高的精度和召回率值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信