在多核架构上发布列表交集

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Pub Date : 2011-07-24 DOI:10.1145/2009916.2010045

S. Tatikonda, B. B. Cambazoglu, F. Junqueira

{"title":"在多核架构上发布列表交集","authors":"S. Tatikonda, B. B. Cambazoglu, F. Junqueira","doi":"10.1145/2009916.2010045","DOIUrl":null,"url":null,"abstract":"In current commercial Web search engines, queries are processed in the conjunctive mode, which requires the search engine to compute the intersection of a number of posting lists to determine the documents matching all query terms. In practice, the intersection operation takes a significant fraction of the query processing time, for some queries dominating the total query latency. Hence, efficient posting list intersection is critical for achieving short query latencies. In this work, we focus on improving the performance of posting list intersection by leveraging the compute capabilities of recent multicore systems. To this end, we consider various coarse-grained and fine-grained parallelization models for list intersection. Specifically, we present an algorithm that partitions the work associated with a given query into a number of small and independent tasks that are subsequently processed in parallel. Through a detailed empirical analysis of these alternative models, we demonstrate that exploiting parallelism at the finest-level of granularity is critical to achieve the best performance on multicore systems. On an eight-core system, the fine-grained parallelization method is able to achieve more than five times reduction in average query processing time while still exploiting the parallelism for high query throughput.","PeriodicalId":356580,"journal":{"name":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","volume":"259 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"46","resultStr":"{\"title\":\"Posting list intersection on multicore architectures\",\"authors\":\"S. Tatikonda, B. B. Cambazoglu, F. Junqueira\",\"doi\":\"10.1145/2009916.2010045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In current commercial Web search engines, queries are processed in the conjunctive mode, which requires the search engine to compute the intersection of a number of posting lists to determine the documents matching all query terms. In practice, the intersection operation takes a significant fraction of the query processing time, for some queries dominating the total query latency. Hence, efficient posting list intersection is critical for achieving short query latencies. In this work, we focus on improving the performance of posting list intersection by leveraging the compute capabilities of recent multicore systems. To this end, we consider various coarse-grained and fine-grained parallelization models for list intersection. Specifically, we present an algorithm that partitions the work associated with a given query into a number of small and independent tasks that are subsequently processed in parallel. Through a detailed empirical analysis of these alternative models, we demonstrate that exploiting parallelism at the finest-level of granularity is critical to achieve the best performance on multicore systems. On an eight-core system, the fine-grained parallelization method is able to achieve more than five times reduction in average query processing time while still exploiting the parallelism for high query throughput.\",\"PeriodicalId\":356580,\"journal\":{\"name\":\"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval\",\"volume\":\"259 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-07-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"46\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2009916.2010045\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2009916.2010045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 46

摘要

在当前的商业Web搜索引擎中，查询是以合取模式处理的，这要求搜索引擎计算多个发布列表的交集，以确定匹配所有查询条件的文档。在实践中，交集操作占用了查询处理时间的很大一部分，对于某些查询来说，交集操作占据了总查询延迟。因此，高效的发布列表交叉对于实现短查询延迟至关重要。在这项工作中，我们专注于通过利用最新的多核系统的计算能力来提高张贴列表交集的性能。为此，我们考虑了列表交集的各种粗粒度和细粒度并行化模型。具体来说，我们提出了一种算法，该算法将与给定查询相关的工作划分为许多小而独立的任务，然后并行处理这些任务。通过对这些备选模型的详细实证分析，我们证明了在最佳粒度级别上利用并行性对于在多核系统上实现最佳性能至关重要。在八核系统上，细粒度并行化方法能够将平均查询处理时间减少五倍以上，同时仍然利用并行性来实现高查询吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Posting list intersection on multicore architectures

In current commercial Web search engines, queries are processed in the conjunctive mode, which requires the search engine to compute the intersection of a number of posting lists to determine the documents matching all query terms. In practice, the intersection operation takes a significant fraction of the query processing time, for some queries dominating the total query latency. Hence, efficient posting list intersection is critical for achieving short query latencies. In this work, we focus on improving the performance of posting list intersection by leveraging the compute capabilities of recent multicore systems. To this end, we consider various coarse-grained and fine-grained parallelization models for list intersection. Specifically, we present an algorithm that partitions the work associated with a given query into a number of small and independent tasks that are subsequently processed in parallel. Through a detailed empirical analysis of these alternative models, we demonstrate that exploiting parallelism at the finest-level of granularity is critical to achieve the best performance on multicore systems. On an eight-core system, the fine-grained parallelization method is able to achieve more than five times reduction in average query processing time while still exploiting the parallelism for high query throughput.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

自引率

0.00%

发文量