A joint model for discovering and linking entities

Michael L. Wick, Sameer Singh, Harshal Pandya, A. McCallum
{"title":"A joint model for discovering and linking entities","authors":"Michael L. Wick, Sameer Singh, Harshal Pandya, A. McCallum","doi":"10.1145/2509558.2509570","DOIUrl":null,"url":null,"abstract":"Entity resolution, the task of automatically determining which mentions refer to the same real-world entity, is a crucial aspect of knowledge base construction and management. However, performing entity resolution at large scales is challenging because (1) the inference algorithms must cope with unavoidable system scalability issues and (2) the search space grows exponentially in the number of mentions. Current conventional wisdom has been that performing coreference at these scales requires decomposing the problem by first solving the simpler task of entity-linking (matching a set of mentions to a known set of KB entities), and then performing entity discovery as a post-processing step (to identify new entities not present in the KB). However, we argue that this traditional approach is harmful to both entity-linking and overall coreference accuracy. Therefore, we embrace the challenge of jointly modeling entity-linking and entity-discovery as a single entity resolution problem. In order to make progress towards scalability we (1) present a model that reasons over compact hierarchical entity representations, and (2) propose a novel distributed inference architecture that does not suffer from the synchronicity bottleneck which is inherent in map-reduce architectures. We demonstrate that more test-time data actually improves the accuracy of coreference, and show that joint coreference is substantially more accurate than traditional entity-linking, reducing error by 75%.","PeriodicalId":371465,"journal":{"name":"Conference on Automated Knowledge Base Construction","volume":"221 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Automated Knowledge Base Construction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2509558.2509570","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

Abstract

Entity resolution, the task of automatically determining which mentions refer to the same real-world entity, is a crucial aspect of knowledge base construction and management. However, performing entity resolution at large scales is challenging because (1) the inference algorithms must cope with unavoidable system scalability issues and (2) the search space grows exponentially in the number of mentions. Current conventional wisdom has been that performing coreference at these scales requires decomposing the problem by first solving the simpler task of entity-linking (matching a set of mentions to a known set of KB entities), and then performing entity discovery as a post-processing step (to identify new entities not present in the KB). However, we argue that this traditional approach is harmful to both entity-linking and overall coreference accuracy. Therefore, we embrace the challenge of jointly modeling entity-linking and entity-discovery as a single entity resolution problem. In order to make progress towards scalability we (1) present a model that reasons over compact hierarchical entity representations, and (2) propose a novel distributed inference architecture that does not suffer from the synchronicity bottleneck which is inherent in map-reduce architectures. We demonstrate that more test-time data actually improves the accuracy of coreference, and show that joint coreference is substantially more accurate than traditional entity-linking, reducing error by 75%.
用于发现和链接实体的联合模型
实体解析,即自动确定哪些提及引用了相同的现实世界实体的任务,是知识库构建和管理的一个关键方面。然而,在大规模执行实体解析是具有挑战性的,因为(1)推理算法必须处理不可避免的系统可伸缩性问题,(2)搜索空间在提及的数量上呈指数级增长。目前的传统观点是,在这些尺度上执行共同引用需要分解问题,首先解决实体链接的简单任务(将一组提及与一组已知的知识库实体进行匹配),然后作为后处理步骤执行实体发现(识别知识库中不存在的新实体)。然而,我们认为这种传统的方法对实体链接和整体共参考准确性都是有害的。因此,我们接受了将实体链接和实体发现联合建模作为单个实体解决问题的挑战。为了在可扩展性方面取得进展,我们(1)提出了一个对紧凑的分层实体表示进行推理的模型,(2)提出了一种新的分布式推理体系结构,该体系结构不会受到映射约简体系结构中固有的同步性瓶颈的影响。研究表明,更多的测试时间数据实际上提高了共同引用的准确性,并且表明联合共同引用比传统的实体链接准确得多,将误差降低了75%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信