Efficient disk-to-disk sorting: a case study in the decoupled execution paradigm

Hassan Eslami, Anthony Kougkas, Maria Kotsifakou, T. Kasampalis, Kun Feng, Yin Lu, W. Gropp, Xian-He Sun, Yong Chen, R. Thakur
{"title":"Efficient disk-to-disk sorting: a case study in the decoupled execution paradigm","authors":"Hassan Eslami, Anthony Kougkas, Maria Kotsifakou, T. Kasampalis, Kun Feng, Yin Lu, W. Gropp, Xian-He Sun, Yong Chen, R. Thakur","doi":"10.1145/2831244.2831249","DOIUrl":null,"url":null,"abstract":"Many applications foreseen for exascale era should process huge amount of data. However, the IO infrastructure of current supercomputing architecture cannot be generalized to deal with this amount of data due to the need for excessive data movement from storage layers to compute nodes leading to limited scalability. There has been extensive studies addressing this challenge. Decoupled Execution Paradigm (DEP) is an attractive solution due to its unique features such as available fast storage devices close to computational units and available programmable units close to file system.\n In this paper we study the effectiveness of DEP for a well-known data-intensive kernel, disk-to-disk (aka out-of-core) sorting. We propose an optimized algorithm that uses almost all features of DEP pushing the performance of sorting in HPC even further compared to other existing solutions. Advantages in our algorithm are gained by exploiting programming units close to parallel file system to achieve higher IO throughput, compressing data before sending it over network or to disk, storing intermediate results of computation close to compute nodes, and fully overlapping IO with computation. We also provide an analytical model for our proposed algorithm. Our algorithm achieves 30% better performance compared to the theoretically optimal sorting algorithm running on the same testbed but not designed to exploit the DEP architecture.","PeriodicalId":166804,"journal":{"name":"International Symposium on Design and Implementation of Symbolic Computation Systems","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Symposium on Design and Implementation of Symbolic Computation Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2831244.2831249","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Many applications foreseen for exascale era should process huge amount of data. However, the IO infrastructure of current supercomputing architecture cannot be generalized to deal with this amount of data due to the need for excessive data movement from storage layers to compute nodes leading to limited scalability. There has been extensive studies addressing this challenge. Decoupled Execution Paradigm (DEP) is an attractive solution due to its unique features such as available fast storage devices close to computational units and available programmable units close to file system. In this paper we study the effectiveness of DEP for a well-known data-intensive kernel, disk-to-disk (aka out-of-core) sorting. We propose an optimized algorithm that uses almost all features of DEP pushing the performance of sorting in HPC even further compared to other existing solutions. Advantages in our algorithm are gained by exploiting programming units close to parallel file system to achieve higher IO throughput, compressing data before sending it over network or to disk, storing intermediate results of computation close to compute nodes, and fully overlapping IO with computation. We also provide an analytical model for our proposed algorithm. Our algorithm achieves 30% better performance compared to the theoretically optimal sorting algorithm running on the same testbed but not designed to exploit the DEP architecture.
高效的磁盘到磁盘排序:解耦执行范例中的一个案例研究
许多应用程序预计在百亿亿次时代将处理大量的数据。然而,当前超级计算体系结构的IO基础设施不能一般化以处理如此大量的数据,因为从存储层到计算节点需要大量的数据移动,导致可伸缩性有限。针对这一挑战已经进行了广泛的研究。解耦执行范式(DEP)是一种很有吸引力的解决方案,因为它具有独特的特性,例如接近计算单元的可用快速存储设备和接近文件系统的可用可编程单元。在本文中,我们研究了DEP对于众所周知的数据密集型内核的有效性,磁盘到磁盘(又名核外)排序。我们提出了一种优化算法,它几乎使用了DEP的所有特征,与其他现有解决方案相比,它进一步提高了HPC中排序的性能。该算法的优势在于利用靠近并行文件系统的编程单元来实现更高的IO吞吐量,在将数据通过网络或磁盘发送之前对其进行压缩,将计算的中间结果存储在靠近计算节点的位置,以及IO与计算完全重叠。我们还为我们提出的算法提供了一个分析模型。与在相同的测试平台上运行的理论上最优排序算法相比,我们的算法的性能提高了30%,但没有设计用于利用DEP架构。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信