Mining evidences for named entity disambiguation

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2013-08-11 DOI:10.1145/2487575.2487681

Yang Li, Chi Wang, Fangqiu Han, Jiawei Han, D. Roth, Xifeng Yan

{"title":"Mining evidences for named entity disambiguation","authors":"Yang Li, Chi Wang, Fangqiu Han, Jiawei Han, D. Roth, Xifeng Yan","doi":"10.1145/2487575.2487681","DOIUrl":null,"url":null,"abstract":"Named entity disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a knowledge base such as Wikipedia. Such disambiguation can help enhance readability and add semantics to plain text. It is also a central step in constructing high-quality information network or knowledge graph from unstructured text. Previous research has tackled this problem by making use of various textual and structural features from a knowledge base. Most of the proposed algorithms assume that a knowledge base can provide enough explicit and useful information to help disambiguate a mention to the right entity. However, the existing knowledge bases are rarely complete (likely will never be), thus leading to poor performance on short queries with not well-known contexts. In such cases, we need to collect additional evidences scattered in internal and external corpus to augment the knowledge bases and enhance their disambiguation power. In this work, we propose a generative model and an incremental algorithm to automatically mine useful evidences across documents. With a specific modeling of \"background topic\" and \"unknown entities\", our model is able to harvest useful evidences out of noisy information. Experimental results show that our proposed method outperforms the state-of-the-art approaches significantly: boosting the disambiguation accuracy from 43% (baseline) to 86% on short queries derived from tweets.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"101","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2487575.2487681","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 101

Abstract

Named entity disambiguation is the task of disambiguating named entity mentions in natural language text and link them to their corresponding entries in a knowledge base such as Wikipedia. Such disambiguation can help enhance readability and add semantics to plain text. It is also a central step in constructing high-quality information network or knowledge graph from unstructured text. Previous research has tackled this problem by making use of various textual and structural features from a knowledge base. Most of the proposed algorithms assume that a knowledge base can provide enough explicit and useful information to help disambiguate a mention to the right entity. However, the existing knowledge bases are rarely complete (likely will never be), thus leading to poor performance on short queries with not well-known contexts. In such cases, we need to collect additional evidences scattered in internal and external corpus to augment the knowledge bases and enhance their disambiguation power. In this work, we propose a generative model and an incremental algorithm to automatically mine useful evidences across documents. With a specific modeling of "background topic" and "unknown entities", our model is able to harvest useful evidences out of noisy information. Experimental results show that our proposed method outperforms the state-of-the-art approaches significantly: boosting the disambiguation accuracy from 43% (baseline) to 86% on short queries derived from tweets.

查看原文本刊更多论文

命名实体消歧的证据挖掘

命名实体消歧的任务是消除自然语言文本中提到的命名实体的歧义，并将它们链接到知识库(如Wikipedia)中的相应条目。这种消歧有助于增强可读性，并为纯文本添加语义。它也是从非结构化文本构建高质量信息网络或知识图谱的核心步骤。以前的研究通过利用知识库中的各种文本和结构特征来解决这个问题。大多数提出的算法都假设知识库可以提供足够明确和有用的信息，以帮助消除对正确实体的提及的歧义。然而，现有的知识库很少是完整的(可能永远不会)，因此导致在不熟悉上下文的短查询上的性能很差。在这种情况下，我们需要收集分散在内部和外部语料库中的额外证据来扩充知识库，增强知识库的消歧能力。在这项工作中，我们提出了一个生成模型和一个增量算法来自动挖掘文档中的有用证据。通过对“背景主题”和“未知实体”的具体建模，我们的模型能够从噪声信息中获取有用的证据。实验结果表明，我们提出的方法明显优于最先进的方法:将来自tweet的短查询的消歧准确率从43%(基线)提高到86%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

自引率

0.00%

发文量