Named entity disambiguation by leveraging wikipedia semantic knowledge

Proceedings of the 18th ACM conference on Information and knowledge management Pub Date : 2009-11-02 DOI:10.1145/1645953.1645983

Xianpei Han, Jun Zhao

{"title":"Named entity disambiguation by leveraging wikipedia semantic knowledge","authors":"Xianpei Han, Jun Zhao","doi":"10.1145/1645953.1645983","DOIUrl":null,"url":null,"abstract":"Name ambiguity problem has raised an urgent demand for efficient, high-quality named entity disambiguation methods. The key problem of named entity disambiguation is to measure the similarity between occurrences of names. The traditional methods measure the similarity using the bag of words (BOW) model. The BOW, however, ignores all the semantic relations such as social relatedness between named entities, associative relatedness between concepts, polysemy and synonymy between key terms. So the BOW cannot reflect the actual similarity. Some research has investigated social networks as background knowledge for disambiguation. Social networks, however, can only capture the social relatedness between named entities, and often suffer the limited coverage problem. To overcome the previous methods' deficiencies, this paper proposes to use Wikipedia as the background knowledge for disambiguation, which surpasses other knowledge bases by the coverage of concepts, rich semantic information and up-to-date content. By leveraging Wikipedia's semantic knowledge like social relatedness between named entities and associative relatedness between concepts, we can measure the similarity between occurrences of names more accurately. In particular, we construct a large-scale semantic network from Wikipedia, in order that the semantic knowledge can be used efficiently and effectively. Based on the constructed semantic network, a novel similarity measure is proposed to leverage Wikipedia semantic knowledge for disambiguation. The proposed method has been tested on the standard WePS data sets. Empirical results show that the disambiguation performance of our method gets 10.7% improvement over the traditional BOW based methods and 16.7% improvement over the traditional social network based methods.","PeriodicalId":286251,"journal":{"name":"Proceedings of the 18th ACM conference on Information and knowledge management","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"189","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 18th ACM conference on Information and knowledge management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1645953.1645983","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 189

Abstract

Name ambiguity problem has raised an urgent demand for efficient, high-quality named entity disambiguation methods. The key problem of named entity disambiguation is to measure the similarity between occurrences of names. The traditional methods measure the similarity using the bag of words (BOW) model. The BOW, however, ignores all the semantic relations such as social relatedness between named entities, associative relatedness between concepts, polysemy and synonymy between key terms. So the BOW cannot reflect the actual similarity. Some research has investigated social networks as background knowledge for disambiguation. Social networks, however, can only capture the social relatedness between named entities, and often suffer the limited coverage problem. To overcome the previous methods' deficiencies, this paper proposes to use Wikipedia as the background knowledge for disambiguation, which surpasses other knowledge bases by the coverage of concepts, rich semantic information and up-to-date content. By leveraging Wikipedia's semantic knowledge like social relatedness between named entities and associative relatedness between concepts, we can measure the similarity between occurrences of names more accurately. In particular, we construct a large-scale semantic network from Wikipedia, in order that the semantic knowledge can be used efficiently and effectively. Based on the constructed semantic network, a novel similarity measure is proposed to leverage Wikipedia semantic knowledge for disambiguation. The proposed method has been tested on the standard WePS data sets. Empirical results show that the disambiguation performance of our method gets 10.7% improvement over the traditional BOW based methods and 16.7% improvement over the traditional social network based methods.

查看原文本刊更多论文

利用维基百科语义知识的命名实体消歧

名称歧义问题迫切需要高效、高质量的命名实体消歧方法。命名实体消歧的关键问题是名称出现之间的相似性度量。传统的相似度度量方法采用词包模型(BOW)。然而，BOW忽略了所有的语义关系，如命名实体之间的社会关系、概念之间的联想关系、关键术语之间的多义和同义词。所以BOW不能反映实际的相似度。一些研究将社会网络作为消歧的背景知识。然而，社会网络只能捕获命名实体之间的社会关系，并且经常遭受有限覆盖的问题。为了克服以往方法的不足，本文提出使用维基百科作为消歧的背景知识，其概念的覆盖面、语义信息的丰富以及内容的更新都超过了其他知识库。通过利用维基百科的语义知识，如命名实体之间的社会关系和概念之间的关联关系，我们可以更准确地测量名称出现之间的相似性。特别地，我们从维基百科中构建了一个大规模的语义网络，以便有效地利用语义知识。在构建语义网络的基础上，提出了一种利用维基百科语义知识进行消歧的相似性度量方法。该方法已在标准WePS数据集上进行了测试。实证结果表明，该方法的消歧性能比传统的基于BOW的方法提高了10.7%，比传统的基于社会网络的方法提高了16.7%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 18th ACM conference on Information and knowledge management

自引率

0.00%

发文量