用于发现和链接实体的联合模型

Conference on Automated Knowledge Base Construction Pub Date : 2013-10-27 DOI:10.1145/2509558.2509570

Michael L. Wick, Sameer Singh, Harshal Pandya, A. McCallum

{"title":"用于发现和链接实体的联合模型","authors":"Michael L. Wick, Sameer Singh, Harshal Pandya, A. McCallum","doi":"10.1145/2509558.2509570","DOIUrl":null,"url":null,"abstract":"Entity resolution, the task of automatically determining which mentions refer to the same real-world entity, is a crucial aspect of knowledge base construction and management. However, performing entity resolution at large scales is challenging because (1) the inference algorithms must cope with unavoidable system scalability issues and (2) the search space grows exponentially in the number of mentions. Current conventional wisdom has been that performing coreference at these scales requires decomposing the problem by first solving the simpler task of entity-linking (matching a set of mentions to a known set of KB entities), and then performing entity discovery as a post-processing step (to identify new entities not present in the KB). However, we argue that this traditional approach is harmful to both entity-linking and overall coreference accuracy. Therefore, we embrace the challenge of jointly modeling entity-linking and entity-discovery as a single entity resolution problem. In order to make progress towards scalability we (1) present a model that reasons over compact hierarchical entity representations, and (2) propose a novel distributed inference architecture that does not suffer from the synchronicity bottleneck which is inherent in map-reduce architectures. We demonstrate that more test-time data actually improves the accuracy of coreference, and show that joint coreference is substantially more accurate than traditional entity-linking, reducing error by 75%.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"221 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"A joint model for discovering and linking entities\",\"authors\":\"Michael L. Wick, Sameer Singh, Harshal Pandya, A. McCallum\",\"doi\":\"10.1145/2509558.2509570\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Entity resolution, the task of automatically determining which mentions refer to the same real-world entity, is a crucial aspect of knowledge base construction and management. However, performing entity resolution at large scales is challenging because (1) the inference algorithms must cope with unavoidable system scalability issues and (2) the search space grows exponentially in the number of mentions. Current conventional wisdom has been that performing coreference at these scales requires decomposing the problem by first solving the simpler task of entity-linking (matching a set of mentions to a known set of KB entities), and then performing entity discovery as a post-processing step (to identify new entities not present in the KB). However, we argue that this traditional approach is harmful to both entity-linking and overall coreference accuracy. Therefore, we embrace the challenge of jointly modeling entity-linking and entity-discovery as a single entity resolution problem. In order to make progress towards scalability we (1) present a model that reasons over compact hierarchical entity representations, and (2) propose a novel distributed inference architecture that does not suffer from the synchronicity bottleneck which is inherent in map-reduce architectures. We demonstrate that more test-time data actually improves the accuracy of coreference, and show that joint coreference is substantially more accurate than traditional entity-linking, reducing error by 75%.\",\"PeriodicalId\":371465,\"journal\":{\"name\":\"Conference on Automated Knowledge Base Construction\",\"volume\":\"221 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Conference on Automated Knowledge Base Construction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2509558.2509570\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Automated Knowledge Base Construction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2509558.2509570","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

实体解析，即自动确定哪些提及引用了相同的现实世界实体的任务，是知识库构建和管理的一个关键方面。然而，在大规模执行实体解析是具有挑战性的，因为(1)推理算法必须处理不可避免的系统可伸缩性问题，(2)搜索空间在提及的数量上呈指数级增长。目前的传统观点是，在这些尺度上执行共同引用需要分解问题，首先解决实体链接的简单任务(将一组提及与一组已知的知识库实体进行匹配)，然后作为后处理步骤执行实体发现(识别知识库中不存在的新实体)。然而，我们认为这种传统的方法对实体链接和整体共参考准确性都是有害的。因此，我们接受了将实体链接和实体发现联合建模作为单个实体解决问题的挑战。为了在可扩展性方面取得进展，我们(1)提出了一个对紧凑的分层实体表示进行推理的模型，(2)提出了一种新的分布式推理体系结构，该体系结构不会受到映射约简体系结构中固有的同步性瓶颈的影响。研究表明，更多的测试时间数据实际上提高了共同引用的准确性，并且表明联合共同引用比传统的实体链接准确得多，将误差降低了75%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A joint model for discovering and linking entities

Entity resolution, the task of automatically determining which mentions refer to the same real-world entity, is a crucial aspect of knowledge base construction and management. However, performing entity resolution at large scales is challenging because (1) the inference algorithms must cope with unavoidable system scalability issues and (2) the search space grows exponentially in the number of mentions. Current conventional wisdom has been that performing coreference at these scales requires decomposing the problem by first solving the simpler task of entity-linking (matching a set of mentions to a known set of KB entities), and then performing entity discovery as a post-processing step (to identify new entities not present in the KB). However, we argue that this traditional approach is harmful to both entity-linking and overall coreference accuracy. Therefore, we embrace the challenge of jointly modeling entity-linking and entity-discovery as a single entity resolution problem. In order to make progress towards scalability we (1) present a model that reasons over compact hierarchical entity representations, and (2) propose a novel distributed inference architecture that does not suffer from the synchronicity bottleneck which is inherent in map-reduce architectures. We demonstrate that more test-time data actually improves the accuracy of coreference, and show that joint coreference is substantially more accurate than traditional entity-linking, reducing error by 75%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Conference on Automated Knowledge Base Construction

自引率

0.00%

发文量