Out-of-core distribution sort in the FG programming environment

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) Pub Date : 2010-04-19 DOI:10.1109/IPDPSW.2010.5470692

P. Natarajan, T. Cormen, E. Strange

{"title":"Out-of-core distribution sort in the FG programming environment","authors":"P. Natarajan, T. Cormen, E. Strange","doi":"10.1109/IPDPSW.2010.5470692","DOIUrl":null,"url":null,"abstract":"We describe the implementation of an out-of-core, distribution-based sorting program on a cluster using FG, a multithreaded programming framework. FG mitigates latency from disk-I/O and interprocessor communication by overlapping such high-latency operations with other operations. It does so by constructing and executing a coarse-grained software pipeline on each node of the cluster, where each stage of the pipeline runs in its own thread. The sorting program distributes data among the nodes to create sorted runs, and then it merges sorted runs on each node. When distributing data, the rates at which a node sends and receives data will differ. When merging sorted runs, each node will consume data from each of its sorted runs at varying rates. Under these conditions, a single pipeline running on each node is unwieldy to program and not necessarily efficient.We describe how we have extended FG to support multiple pipelines on each node in two forms. When a node might send and receive data at different rates during interprocessor communication, we use disjoint pipelines on each node: one pipeline to send and one pipeline to receive. When a node consumes and produces data from different streams on the node, we use multiple pipelines that intersect at a particular stage. Experimental results show that by using multiple pipelines, an out-of-core, distribution-based sorting program outperforms an out-of-core sorting program based on columnsort-taking approximately 75%–85% of the time-despite the advantages that the columnsort-based program holds.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2010.5470692","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

We describe the implementation of an out-of-core, distribution-based sorting program on a cluster using FG, a multithreaded programming framework. FG mitigates latency from disk-I/O and interprocessor communication by overlapping such high-latency operations with other operations. It does so by constructing and executing a coarse-grained software pipeline on each node of the cluster, where each stage of the pipeline runs in its own thread. The sorting program distributes data among the nodes to create sorted runs, and then it merges sorted runs on each node. When distributing data, the rates at which a node sends and receives data will differ. When merging sorted runs, each node will consume data from each of its sorted runs at varying rates. Under these conditions, a single pipeline running on each node is unwieldy to program and not necessarily efficient.We describe how we have extended FG to support multiple pipelines on each node in two forms. When a node might send and receive data at different rates during interprocessor communication, we use disjoint pipelines on each node: one pipeline to send and one pipeline to receive. When a node consumes and produces data from different streams on the node, we use multiple pipelines that intersect at a particular stage. Experimental results show that by using multiple pipelines, an out-of-core, distribution-based sorting program outperforms an out-of-core sorting program based on columnsort-taking approximately 75%–85% of the time-despite the advantages that the columnsort-based program holds.

查看原文本刊更多论文

FG编程环境下的核外分布排序

我们描述了使用多线程编程框架FG在集群上实现一个核外的、基于分布的排序程序。FG通过将这些高延迟操作与其他操作重叠，减轻了磁盘i /O和处理器间通信的延迟。它通过在集群的每个节点上构造和执行粗粒度的软件管道来实现这一点，其中管道的每个阶段在其自己的线程中运行。排序程序在节点之间分配数据以创建排序运行，然后在每个节点上合并排序运行。在分发数据时，节点发送和接收数据的速率会有所不同。在合并排序运行时，每个节点将以不同的速率从每个排序运行中消耗数据。在这些条件下，在每个节点上运行单个管道对编程来说是笨拙的，而且不一定有效。我们描述了如何扩展FG以两种形式支持每个节点上的多个管道。当一个节点在处理器间通信期间可能以不同的速率发送和接收数据时，我们在每个节点上使用不相交的管道:一个管道发送，一个管道接收。当一个节点从节点上的不同流中消费和产生数据时，我们使用在特定阶段相交的多个管道。实验结果表明，尽管基于列排序的程序具有优势，但通过使用多个管道，基于分布的核外排序程序的性能优于基于列排序的核外排序程序——大约占用75%-85%的时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

自引率

0.00%

发文量