琳达:分布式数据网络规模的实体匹配

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI:10.1145/2396761.2398582

Christoph Böhm, Gerard de Melo, Felix Naumann, G. Weikum

{"title":"琳达:分布式数据网络规模的实体匹配","authors":"Christoph Böhm, Gerard de Melo, Felix Naumann, G. Weikum","doi":"10.1145/2396761.2398582","DOIUrl":null,"url":null,"abstract":"Linked Data has emerged as a powerful way of interconnecting structured data on the Web. However, the cross-linkage between Linked Data sources is not as extensive as one would hope for. In this paper, we formalize the task of automatically creating \"sameAs\" links across data sources in a globally consistent manner. Our algorithm, presented in a multi-core as well as a distributed version, achieves this link generation by accounting for joint evidence of a match. Experiments confirm that our system scales beyond 100 million entities and delivers highly accurate results despite the vast heterogeneity and daunting scale.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"94","resultStr":"{\"title\":\"LINDA: distributed web-of-data-scale entity matching\",\"authors\":\"Christoph Böhm, Gerard de Melo, Felix Naumann, G. Weikum\",\"doi\":\"10.1145/2396761.2398582\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Linked Data has emerged as a powerful way of interconnecting structured data on the Web. However, the cross-linkage between Linked Data sources is not as extensive as one would hope for. In this paper, we formalize the task of automatically creating \\\"sameAs\\\" links across data sources in a globally consistent manner. Our algorithm, presented in a multi-core as well as a distributed version, achieves this link generation by accounting for joint evidence of a match. Experiments confirm that our system scales beyond 100 million entities and delivers highly accurate results despite the vast heterogeneity and daunting scale.\",\"PeriodicalId\":313414,\"journal\":{\"name\":\"Proceedings of the 21st ACM international conference on Information and knowledge management\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"94\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 21st ACM international conference on Information and knowledge management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2396761.2398582\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st ACM international conference on Information and knowledge management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2396761.2398582","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 94

摘要

关联数据(Linked Data)作为一种连接Web上结构化数据的强大方式而出现。然而，关联数据源之间的交叉链接并不像人们希望的那样广泛。在本文中，我们以全局一致的方式形式化了跨数据源自动创建“相同”链接的任务。我们的算法在多核和分布式版本中提出，通过考虑匹配的联合证据来实现这种链接生成。实验证实，尽管存在巨大的异质性和令人生畏的规模，我们的系统仍然可以扩展超过1亿个实体，并提供高度准确的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

LINDA: distributed web-of-data-scale entity matching

Linked Data has emerged as a powerful way of interconnecting structured data on the Web. However, the cross-linkage between Linked Data sources is not as extensive as one would hope for. In this paper, we formalize the task of automatically creating "sameAs" links across data sources in a globally consistent manner. Our algorithm, presented in a multi-core as well as a distributed version, achieves this link generation by accounting for joint evidence of a match. Experiments confirm that our system scales beyond 100 million entities and delivers highly accurate results despite the vast heterogeneity and daunting scale.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 21st ACM international conference on Information and knowledge management

自引率

0.00%

发文量