CEDAL: time-efficient detection of erroneous links in large-scale link repositories

André Valdestilhas, Tommaso Soru, A. N. Ngomo
{"title":"CEDAL: time-efficient detection of erroneous links in large-scale link repositories","authors":"André Valdestilhas, Tommaso Soru, A. N. Ngomo","doi":"10.1145/3106426.3106497","DOIUrl":null,"url":null,"abstract":"More than 500 million facts on the Linked Data Web are statements across knowledge bases. These links are of crucial importance for the Linked Data Web as they make a large number of tasks possible, including cross-ontology, question answering and federated queries. However, a large number of these links are erroneous and can thus lead to these applications producing absurd results. We present a time-efficient and complete approach for the detection of erroneous links for properties that are transitive. To this end, we make use of the semantics of URIs on the Data Web and combine it with an efficient graph partitioning algorithm. We then apply our algorithm to the LinkLion repository and show that we can analyze 19,200,114 links in 4.6 minutes. Our results show that at least 13% of the owl :sameAs links we considered are erroneous. In addition, our analysis of the provenance of links allows discovering agents and knowledge bases that commonly display poor linking. Our algorithm can be easily executed in parallel and on a GPU. We show that these implementations are up to two orders of magnitude faster than classical reasoners and a non-parallel implementation.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3106426.3106497","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

More than 500 million facts on the Linked Data Web are statements across knowledge bases. These links are of crucial importance for the Linked Data Web as they make a large number of tasks possible, including cross-ontology, question answering and federated queries. However, a large number of these links are erroneous and can thus lead to these applications producing absurd results. We present a time-efficient and complete approach for the detection of erroneous links for properties that are transitive. To this end, we make use of the semantics of URIs on the Data Web and combine it with an efficient graph partitioning algorithm. We then apply our algorithm to the LinkLion repository and show that we can analyze 19,200,114 links in 4.6 minutes. Our results show that at least 13% of the owl :sameAs links we considered are erroneous. In addition, our analysis of the provenance of links allows discovering agents and knowledge bases that commonly display poor linking. Our algorithm can be easily executed in parallel and on a GPU. We show that these implementations are up to two orders of magnitude faster than classical reasoners and a non-parallel implementation.
CEDAL:在大规模链接存储库中高效地检测错误链接
关联数据网上有超过5亿个事实是跨知识库的陈述。这些链接对于关联数据Web至关重要,因为它们使大量任务成为可能,包括跨本体、问答和联合查询。然而,大量这些链接是错误的,因此可能导致这些应用程序产生荒谬的结果。我们提出了一种省时和完整的方法来检测可传递属性的错误链接。为此,我们利用了Data Web上的uri语义,并将其与高效的图划分算法相结合。然后,我们将我们的算法应用到LinkLion存储库,并表明我们可以在4.6分钟内分析19,200,114个链接。我们的研究结果表明,至少有13%的猫头鹰:相同的链接是错误的。此外,我们对链接来源的分析允许发现通常显示不良链接的代理和知识库。我们的算法可以很容易地在GPU上并行执行。我们表明,这些实现比经典推理器和非并行实现快两个数量级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信