Entity Disambiguation with Linkless Knowledge Bases

Proceedings of the 25th International Conference on World Wide Web Pub Date : 2016-04-11 DOI:10.1145/2872427.2883068

Yang Li, Shulong Tan, Huan Sun, Jiawei Han, D. Roth, Xifeng Yan

{"title":"Entity Disambiguation with Linkless Knowledge Bases","authors":"Yang Li, Shulong Tan, Huan Sun, Jiawei Han, D. Roth, Xifeng Yan","doi":"10.1145/2872427.2883068","DOIUrl":null,"url":null,"abstract":"Named Entity Disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a reference knowledge base (e.g. Wikipedia). Such disambiguation can help add semantics to plain text and distinguish homonymous entities. Previous research has tackled this problem by making use of two types of context-aware features derived from the reference knowledge base, namely, the context similarity and the semantic relatedness. Both features heavily rely on the cross-document hyperlinks within the knowledge base: the semantic relatedness feature is directly measured via those hyperlinks, while the context similarity feature implicitly makes use of those hyperlinks to expand entity candidates' descriptions and then compares them against the query context. Unfortunately, cross-document hyperlinks are rarely available in many closed domain knowledge bases and it is very expensive to manually add such links. Therefore few algorithms can work well on linkless knowledge bases. In this work, we propose the challenging Named Entity Disambiguation with Linkless Knowledge Bases (LNED) problem and tackle it by leveraging the useful disambiguation evidences scattered across the reference knowledge base. We propose a generative model to automatically mine such evidences out of noisy information. The mined evidences can mimic the role of the missing links and help boost the LNED performance. Experimental results show that our proposed method substantially improves the disambiguation accuracy over the baseline approaches.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Conference on World Wide Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2872427.2883068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

Abstract

Named Entity Disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a reference knowledge base (e.g. Wikipedia). Such disambiguation can help add semantics to plain text and distinguish homonymous entities. Previous research has tackled this problem by making use of two types of context-aware features derived from the reference knowledge base, namely, the context similarity and the semantic relatedness. Both features heavily rely on the cross-document hyperlinks within the knowledge base: the semantic relatedness feature is directly measured via those hyperlinks, while the context similarity feature implicitly makes use of those hyperlinks to expand entity candidates' descriptions and then compares them against the query context. Unfortunately, cross-document hyperlinks are rarely available in many closed domain knowledge bases and it is very expensive to manually add such links. Therefore few algorithms can work well on linkless knowledge bases. In this work, we propose the challenging Named Entity Disambiguation with Linkless Knowledge Bases (LNED) problem and tackle it by leveraging the useful disambiguation evidences scattered across the reference knowledge base. We propose a generative model to automatically mine such evidences out of noisy information. The mined evidences can mimic the role of the missing links and help boost the LNED performance. Experimental results show that our proposed method substantially improves the disambiguation accuracy over the baseline approaches.

查看原文本刊更多论文

基于无链接知识库的实体消歧

命名实体消歧是消除自然语言文本中提到的命名实体的歧义，并将它们链接到参考知识库(例如Wikipedia)中的相应条目。这种消歧可以帮助为纯文本添加语义并区分同义实体。以往的研究主要是利用从参考知识库中衍生出来的两类上下文感知特征，即上下文相似性和语义相关性来解决这一问题。这两个特征都严重依赖于知识库中的跨文档超链接:语义相关性特征是通过这些超链接直接测量的，而上下文相似性特征则隐式地利用这些超链接来扩展实体候选的描述，然后将它们与查询上下文进行比较。不幸的是，在许多封闭的领域知识库中很少有跨文档的超链接，而且手动添加这种链接的成本非常高。因此，很少有算法能很好地处理无链接知识库。在这项工作中，我们提出了具有挑战性的无链接知识库命名实体消歧(LNED)问题，并利用分散在参考知识库中的有用消歧证据来解决该问题。我们提出了一个生成模型来自动地从噪声信息中挖掘这些证据。挖掘的证据可以模拟缺失环节的作用，有助于提高LNED的性能。实验结果表明，该方法较基线方法的消歧精度有较大提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 25th International Conference on World Wide Web

自引率

0.00%

发文量