I/O-Optimal Distribution Sweeping on Private-Cache Chip Multiprocessors

2011 IEEE International Parallel & Distributed Processing Symposium Pub Date : 2011-05-16 DOI:10.1109/IPDPS.2011.106

Deepak Ajwani, Nodari Sitchinava, N. Zeh

{"title":"I/O-Optimal Distribution Sweeping on Private-Cache Chip Multiprocessors","authors":"Deepak Ajwani, Nodari Sitchinava, N. Zeh","doi":"10.1109/IPDPS.2011.106","DOIUrl":null,"url":null,"abstract":"The parallel external memory (PEM) model has been used as a basis for the design and analysis of a wide range of algorithms for private-cache multi-core architectures. As a tool for developing geometric algorithms in this model, a parallel version of the I/O-efficient distribution sweeping framework was introduced recently, and a number of algorithms for problems on axis-aligned objects were obtained using this framework. The obtained algorithms were efficient but not optimal. In this paper, we improve the framework to obtain algorithms with the optimal I/O complexity of $O(sort {P}(N) + K/PB)$ for a number of problems on axis-aligned objects, $P$ denotes the number of cores/processors, $B$ denotes the number of elements that fit in a cache line, $N$ and $K$ denote the sizes of the input and output, respectively, and $sort {P}(N)$ denotes the I/O complexity of sorting $N$ items using $P$ processors in the PEM model. To obtain the above improvement, we present a new one-dimensional batched range counting algorithm on a sorted list of ranges and points that achieves an I/O complexity of $O((N + K)/PB)$, where $K$ is the sum of the counts of all the ranges. The key to achieving efficient load balancing among the processors in this algorithm is a new method to count the output without enumerating it, which might be of independent interest.","PeriodicalId":355100,"journal":{"name":"2011 IEEE International Parallel & Distributed Processing Symposium","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Parallel & Distributed Processing Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2011.106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

The parallel external memory (PEM) model has been used as a basis for the design and analysis of a wide range of algorithms for private-cache multi-core architectures. As a tool for developing geometric algorithms in this model, a parallel version of the I/O-efficient distribution sweeping framework was introduced recently, and a number of algorithms for problems on axis-aligned objects were obtained using this framework. The obtained algorithms were efficient but not optimal. In this paper, we improve the framework to obtain algorithms with the optimal I/O complexity of $O(sort {P}(N) + K/PB)$ for a number of problems on axis-aligned objects, $P$ denotes the number of cores/processors, $B$ denotes the number of elements that fit in a cache line, $N$ and $K$ denote the sizes of the input and output, respectively, and $sort {P}(N)$ denotes the I/O complexity of sorting $N$ items using $P$ processors in the PEM model. To obtain the above improvement, we present a new one-dimensional batched range counting algorithm on a sorted list of ranges and points that achieves an I/O complexity of $O((N + K)/PB)$, where $K$ is the sum of the counts of all the ranges. The key to achieving efficient load balancing among the processors in this algorithm is a new method to count the output without enumerating it, which might be of independent interest.

查看原文本刊更多论文

私有缓存芯片多处理器的I/ o最优分布扫描

并行外部存储器(PEM)模型已被用作设计和分析私有缓存多核架构中各种算法的基础。作为开发该模型中的几何算法的工具，最近引入了一个并行版本的I/ o高效分布扫描框架，并利用该框架获得了一些针对轴对齐对象问题的算法。所得算法是有效的，但不是最优的。在本文中，我们改进了该框架，得到了具有最优I/O复杂度为$O(sort {P}(N) + K/PB)$的算法，其中$P$表示内核/处理器的数量，$B$表示缓存行中可容纳的元素的数量，$N$和$K$分别表示输入和输出的大小，$sort {P}(N)$表示PEM模型中使用$P$处理器对$N$项排序的I/O复杂度。为了获得上述改进，我们提出了一种新的一维批量范围计数算法，该算法在范围和点的排序列表上实现I/O复杂度为$O((N + K)/PB)$，其中$K$为所有范围计数的总和。该算法实现处理器间高效负载平衡的关键是采用一种新的方法来计数输出，而不是枚举输出，这可能是独立的兴趣。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE International Parallel & Distributed Processing Symposium

自引率

0.00%

发文量