Fast and Space-Efficient Entity Linking for Queries

Proceedings of the Eighth ACM International Conference on Web Search and Data Mining Pub Date : 2015-02-02 DOI:10.1145/2684822.2685317

Roi Blanco, G. Ottaviano, E. Meij

{"title":"Fast and Space-Efficient Entity Linking for Queries","authors":"Roi Blanco, G. Ottaviano, E. Meij","doi":"10.1145/2684822.2685317","DOIUrl":null,"url":null,"abstract":"Entity linking deals with identifying entities from a knowledge base in a given piece of text and has become a fundamental building block for web search engines, enabling numerous downstream improvements from better document ranking to enhanced search results pages. A key problem in the context of web search queries is that this process needs to run under severe time constraints as it has to be performed before any actual retrieval takes place, typically within milliseconds. In this paper we propose a probabilistic model that leverages user-generated information on the web to link queries to entities in a knowledge base. There are three key ingredients that make the algorithm fast and space-efficient. First, the linking process ignores any dependencies between the different entity candidates, which allows for a O(k2) implementation in the number of query terms. Second, we leverage hashing and compression techniques to reduce the memory footprint. Finally, to equip the algorithm with contextual knowledge without sacrificing speed, we factor the distance between distributional semantics of the query words and entities into the model. We show that our solution significantly outperforms several state-of-the-art baselines by more than 14% while being able to process queries in sub-millisecond times---at least two orders of magnitude faster than existing systems.","PeriodicalId":179443,"journal":{"name":"Proceedings of the Eighth ACM International Conference on Web Search and Data Mining","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"202","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Eighth ACM International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2684822.2685317","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 202

Abstract

Entity linking deals with identifying entities from a knowledge base in a given piece of text and has become a fundamental building block for web search engines, enabling numerous downstream improvements from better document ranking to enhanced search results pages. A key problem in the context of web search queries is that this process needs to run under severe time constraints as it has to be performed before any actual retrieval takes place, typically within milliseconds. In this paper we propose a probabilistic model that leverages user-generated information on the web to link queries to entities in a knowledge base. There are three key ingredients that make the algorithm fast and space-efficient. First, the linking process ignores any dependencies between the different entity candidates, which allows for a O(k2) implementation in the number of query terms. Second, we leverage hashing and compression techniques to reduce the memory footprint. Finally, to equip the algorithm with contextual knowledge without sacrificing speed, we factor the distance between distributional semantics of the query words and entities into the model. We show that our solution significantly outperforms several state-of-the-art baselines by more than 14% while being able to process queries in sub-millisecond times---at least two orders of magnitude faster than existing systems.

查看原文本刊更多论文

用于查询的快速和节省空间的实体链接

实体链接处理从给定文本片段的知识库中识别实体的问题，并且已经成为web搜索引擎的基本构建块，可以实现从更好的文档排名到增强的搜索结果页面的许多下游改进。web搜索查询的一个关键问题是，这个过程需要在严格的时间限制下运行，因为它必须在任何实际检索发生之前执行，通常在几毫秒内。在本文中，我们提出了一个概率模型，该模型利用网络上用户生成的信息将查询链接到知识库中的实体。有三个关键因素使算法快速和节省空间。首先，链接过程忽略了不同候选实体之间的任何依赖关系，这允许在查询词的数量上实现0 (k2)。其次，我们利用散列和压缩技术来减少内存占用。最后，为了在不牺牲速度的情况下为算法提供上下文知识，我们将查询词和实体的分布语义之间的距离考虑到模型中。我们表明，我们的解决方案的性能明显优于几种最先进的基线14%以上，同时能够在亚毫秒的时间内处理查询——比现有系统至少快两个数量级。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Eighth ACM International Conference on Web Search and Data Mining

自引率

0.00%

发文量