Optimizing Hyper-Phrase Queries

Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval Pub Date : 2020-09-14 DOI:10.1145/3409256.3409827

Dhruv Gupta, K. Berberich

引用次数: 0

Abstract

A hyper-phrase query (HPQ) consists of a sequence of phrase sets. Such queries naturally arise when attempting to spot knowledge graph (KG) facts or sets of KG facts in large document collections to establish their provenance. Our approach addresses this challenge by proposing query operators to detect text regions in documents that correspond to the HPQ as combinations of n-grams and skip-grams. The optimization lies in identifying the most cost-efficient order of query operators that can be executed to identify the text regions containing the HPQ. We show the efficiency of our optimizations on spotting facts from Wikidata in document collections amounting to more than thirty million documents.

查看原文本刊更多论文

优化超短语查询

超短语查询(HPQ)由一系列短语集组成。当试图在大型文档集合中发现知识图(KG)事实或KG事实集以确定其来源时，自然会出现这样的查询。我们的方法通过提出查询操作符来检测文档中的文本区域，这些文本区域对应于n-grams和skip-grams的组合，从而解决了这一挑战。优化在于确定可以执行的查询操作符的最具成本效益的顺序，以标识包含HPQ的文本区域。我们展示了从维基数据的文档集合中发现事实的优化效率，这些文档集合总计超过3000万个文档。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval

自引率

0.00%

发文量