Efficient Query Processing for Scalable Web Search

IF 12.9 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval Pub Date : 2018-12-23 DOI:10.1561/1500000057

N. Tonellotto, C. Macdonald, I. Ounis

{"title":"Efficient Query Processing for Scalable Web Search","authors":"N. Tonellotto, C. Macdonald, I. Ounis","doi":"10.1561/1500000057","DOIUrl":null,"url":null,"abstract":"Search engines are exceptionally important tools for accessing information in today’s world. In satisfying the information needs of millions of users, the effectiveness (the quality of the search results) and the efficiency (the speed at which the results are returned to the users) of a search engine are two goals that form a natural trade-off, as techniques that improve the effectiveness of the search engine can also make it less efficient. Meanwhile, search engines continue to rapidly evolve, with larger indexes, more complex retrieval strategies and growing query volumes. Hence, there is a need for the development of efficient query processing infrastructures that make appropriate sacrifices in effectiveness in order to make gains in efficiency. This survey comprehensively reviews the foundations of search engines, from index layouts to basic term-at-a-time (TAAT) and document-at-a-time (DAAT) query processing strategies, while also providing the latest trends in the literature in efficient query processing, including the coherent and systematic reviews of techniques such as dynamic pruning and impact-sorted posting lists as well as their variants and optimisations. Our explanations of query processing strategies, for instance the WAND and BMW dynamic pruning algorithms, are presented with illustrative figures showing how the processing state changes as the algorithms progress. Moreover, acknowledging the recent trends in applying a cascading infrastructure within search systems, this survey describes techniques for efficiently integrating effective learned models, such as those obtained from learning-torank techniques. The survey also covers the selective application of query processing techniques, often achieved by predicting the response times of the search engine (known as query efficiency prediction), and making per-query tradeoffs between efficiency and effectiveness to ensure that the required retrieval speed targets can be met. Finally, the survey concludes with a summary of open directions in efficient search infrastructures, namely the use of signatures, real-time, energy-efficient and modern hardware & software architectures.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"40 1","pages":"319-500"},"PeriodicalIF":12.9000,"publicationDate":"2018-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Foundations and Trends in Information Retrieval","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1561/1500000057","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 40

Abstract

Search engines are exceptionally important tools for accessing information in today’s world. In satisfying the information needs of millions of users, the effectiveness (the quality of the search results) and the efficiency (the speed at which the results are returned to the users) of a search engine are two goals that form a natural trade-off, as techniques that improve the effectiveness of the search engine can also make it less efficient. Meanwhile, search engines continue to rapidly evolve, with larger indexes, more complex retrieval strategies and growing query volumes. Hence, there is a need for the development of efficient query processing infrastructures that make appropriate sacrifices in effectiveness in order to make gains in efficiency. This survey comprehensively reviews the foundations of search engines, from index layouts to basic term-at-a-time (TAAT) and document-at-a-time (DAAT) query processing strategies, while also providing the latest trends in the literature in efficient query processing, including the coherent and systematic reviews of techniques such as dynamic pruning and impact-sorted posting lists as well as their variants and optimisations. Our explanations of query processing strategies, for instance the WAND and BMW dynamic pruning algorithms, are presented with illustrative figures showing how the processing state changes as the algorithms progress. Moreover, acknowledging the recent trends in applying a cascading infrastructure within search systems, this survey describes techniques for efficiently integrating effective learned models, such as those obtained from learning-torank techniques. The survey also covers the selective application of query processing techniques, often achieved by predicting the response times of the search engine (known as query efficiency prediction), and making per-query tradeoffs between efficiency and effectiveness to ensure that the required retrieval speed targets can be met. Finally, the survey concludes with a summary of open directions in efficient search infrastructures, namely the use of signatures, real-time, energy-efficient and modern hardware & software architectures.

查看原文本刊更多论文

可扩展Web搜索的高效查询处理

搜索引擎是当今世界获取信息的特别重要的工具。在满足数百万用户的信息需求时，搜索引擎的有效性(搜索结果的质量)和效率(将结果返回给用户的速度)是自然形成权衡的两个目标，因为提高搜索引擎有效性的技术也可能使其效率降低。与此同时，搜索引擎继续快速发展，索引更大，检索策略更复杂，查询量不断增长。因此，需要开发高效的查询处理基础设施，以适当牺牲有效性来获得效率方面的收益。本调查全面回顾了搜索引擎的基础，从索引布局到基本的一次术语(TAAT)和一次文档(DAAT)查询处理策略，同时也提供了有效查询处理方面的最新趋势，包括对动态修剪和影响排序发布列表等技术的连贯和系统的回顾，以及它们的变体和优化。我们对查询处理策略(例如WAND和BMW动态剪枝算法)的解释用插图说明了处理状态如何随着算法的进展而变化。此外，考虑到在搜索系统中应用级联基础设施的最新趋势，本调查描述了有效集成有效学习模型的技术，例如从学习-秩技术中获得的技术。该调查还涵盖了查询处理技术的选择性应用，通常通过预测搜索引擎的响应时间(称为查询效率预测)来实现，并在效率和有效性之间进行每个查询的权衡，以确保能够满足所需的检索速度目标。最后，调查总结了高效搜索基础设施的开放方向，即签名、实时、节能和现代硬件和软件架构的使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Foundations and Trends in Information Retrieval COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

39.10

自引率

0.00%

发文量

期刊介绍： The surge in research across all domains in the past decade has resulted in a plethora of new publications, causing an exponential growth in published research. Navigating through this extensive literature and staying current has become a time-consuming challenge. While electronic publishing provides instant access to more articles than ever, discerning the essential ones for a comprehensive understanding of any topic remains an issue. To tackle this, Foundations and Trends® in Information Retrieval - FnTIR - addresses the problem by publishing high-quality survey and tutorial monographs in the field. Each issue of Foundations and Trends® in Information Retrieval - FnT IR features a 50-100 page monograph authored by research leaders, covering tutorial subjects, research retrospectives, and survey papers that provide state-of-the-art reviews within the scope of the journal.