Pivot-based Candidate Retrieval for Cross-lingual Entity Linking

Proceedings of the Web Conference 2021 Pub Date : 2021-04-19 DOI:10.1145/3442381.3449852

Qian Liu, Xiubo Geng, Jie Lu, Daxin Jiang

{"title":"Pivot-based Candidate Retrieval for Cross-lingual Entity Linking","authors":"Qian Liu, Xiubo Geng, Jie Lu, Daxin Jiang","doi":"10.1145/3442381.3449852","DOIUrl":null,"url":null,"abstract":"Entity candidate retrieval plays a critical role in cross-lingual entity linking (XEL). In XEL, entity candidate retrieval needs to retrieve a list of plausible candidate entities from a large knowledge graph in a target language given a piece of text in a sentence or question, namely a mention, in a source language. Existing works mainly fall into two categories: lexicon-based and semantic-based approaches. The lexicon-based approach usually creates cross-lingual and mention-entity lexicons, which is effective but relies heavily on bilingual resources (e.g. inter-language links in Wikipedia). The semantic-based approach maps mentions and entities in different languages to a unified embedding space, which reduces dependence on large-scale bilingual dictionaries. However, its effectiveness is limited by the representation capacity of fixed-length vectors. In this paper, we propose a pivot-based approach which inherits the advantages of the aforementioned two approaches while avoiding their limitations. It takes an intermediary set of plausible target-language mentions as pivots to bridge the two types of gaps: cross-lingual gap and mention-entity gap. Specifically, it first converts mentions in the source language into an intermediary set of plausible mentions in the target language by cross-lingual semantic retrieval and a selective mechanism, and then retrieves candidate entities based on the generated mentions by lexical retrieval. The proposed approach only relies on a small bilingual word dictionary, and fully exploits the benefits of both lexical and semantic matching. Experimental results on two challenging cross-lingual entity linking datasets spanning over 11 languages show that the pivot-based approach outperforms both the lexicon-based and semantic-based approach by a large margin.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"281 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Web Conference 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3442381.3449852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Entity candidate retrieval plays a critical role in cross-lingual entity linking (XEL). In XEL, entity candidate retrieval needs to retrieve a list of plausible candidate entities from a large knowledge graph in a target language given a piece of text in a sentence or question, namely a mention, in a source language. Existing works mainly fall into two categories: lexicon-based and semantic-based approaches. The lexicon-based approach usually creates cross-lingual and mention-entity lexicons, which is effective but relies heavily on bilingual resources (e.g. inter-language links in Wikipedia). The semantic-based approach maps mentions and entities in different languages to a unified embedding space, which reduces dependence on large-scale bilingual dictionaries. However, its effectiveness is limited by the representation capacity of fixed-length vectors. In this paper, we propose a pivot-based approach which inherits the advantages of the aforementioned two approaches while avoiding their limitations. It takes an intermediary set of plausible target-language mentions as pivots to bridge the two types of gaps: cross-lingual gap and mention-entity gap. Specifically, it first converts mentions in the source language into an intermediary set of plausible mentions in the target language by cross-lingual semantic retrieval and a selective mechanism, and then retrieves candidate entities based on the generated mentions by lexical retrieval. The proposed approach only relies on a small bilingual word dictionary, and fully exploits the benefits of both lexical and semantic matching. Experimental results on two challenging cross-lingual entity linking datasets spanning over 11 languages show that the pivot-based approach outperforms both the lexicon-based and semantic-based approach by a large margin.

查看原文本刊更多论文

基于数据轴的跨语言实体链接候选检索

候选实体检索在跨语言实体链接中起着至关重要的作用。在XEL中，实体候选检索需要在给定源语言的句子或问题(即提及)中的一段文本的情况下，从目标语言的大型知识图谱中检索可信的候选实体列表。现有的研究主要分为两大类:基于词汇的方法和基于语义的方法。基于词典的方法通常创建跨语言和提及实体的词典，这是有效的，但严重依赖于双语资源(例如维基百科中的跨语言链接)。基于语义的方法将不同语言的提及和实体映射到统一的嵌入空间，减少了对大型双语词典的依赖。然而，它的有效性受到固定长度向量表示能力的限制。在本文中，我们提出了一种基于支点的方法，它继承了上述两种方法的优点，同时避免了它们的局限性。它以一组似是而非的目标语提及作为支点来弥合两种类型的差距:跨语言差距和提及-实体差距。具体来说，它首先通过跨语言语义检索和选择机制将源语言中的提及转换为目标语言中可信提及的中介集，然后通过词汇检索根据生成的提及检索候选实体。该方法仅依赖于一个小型的双语词词典，充分利用了词汇和语义匹配的优势。在跨越11种语言的两个具有挑战性的跨语言实体链接数据集上的实验结果表明，基于支点的方法在很大程度上优于基于词典和基于语义的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Web Conference 2021

自引率

0.00%

发文量