Runway: In-transit Data Compression on Heterogeneous HPC Systems

2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid) Pub Date : 2023-05-01 DOI:10.1109/CCGrid57682.2023.00030

J. Ravi, S. Byna, M. Becchi

{"title":"Runway: In-transit Data Compression on Heterogeneous HPC Systems","authors":"J. Ravi, S. Byna, M. Becchi","doi":"10.1109/CCGrid57682.2023.00030","DOIUrl":null,"url":null,"abstract":"To alleviate bottlenecks in storing and accessing data on high-performance computing (HPC) systems, I/O libraries are enabling computation while data is in-transit, such as HDFS filters. For scientific applications that commonly use floating-point data, error-bounded lossy compression methods are a critical technique to significantly reduce the storage and bandwidth requirements. Thus far, deciding when and where to schedule in-transit data transformations, such as compression, has been outside the scope of I/O libraries. In this paper, we introduce Runway, a runtime framework that enables computation on in-transit data with an object storage abstraction. Runway is designed to be extensible to execute user-defined functions at runtime. In this effort, we focus on studying methods to offload data compression operations to available processing units based on latency and throughput. We compare the performance of running compression on multi-core CPUs, as well as offloading it to a GPU and a Data Processing Unit (DPU). We implement a state-of-the-art error-bounded lossy compression algorithm, SZ3, as a Runway function with a variant optimized for DPUs. We propose dynamic modeling to guide scheduling decisions for in-transit data compression. We evaluate Runway using four scientific datasets from the SDRBench benchmark suite on a the Perlmutter supercomputer at NERSC.","PeriodicalId":363806,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid57682.2023.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

To alleviate bottlenecks in storing and accessing data on high-performance computing (HPC) systems, I/O libraries are enabling computation while data is in-transit, such as HDFS filters. For scientific applications that commonly use floating-point data, error-bounded lossy compression methods are a critical technique to significantly reduce the storage and bandwidth requirements. Thus far, deciding when and where to schedule in-transit data transformations, such as compression, has been outside the scope of I/O libraries. In this paper, we introduce Runway, a runtime framework that enables computation on in-transit data with an object storage abstraction. Runway is designed to be extensible to execute user-defined functions at runtime. In this effort, we focus on studying methods to offload data compression operations to available processing units based on latency and throughput. We compare the performance of running compression on multi-core CPUs, as well as offloading it to a GPU and a Data Processing Unit (DPU). We implement a state-of-the-art error-bounded lossy compression algorithm, SZ3, as a Runway function with a variant optimized for DPUs. We propose dynamic modeling to guide scheduling decisions for in-transit data compression. We evaluate Runway using four scientific datasets from the SDRBench benchmark suite on a the Perlmutter supercomputer at NERSC.

查看原文本刊更多论文

跑道:异构HPC系统的在途数据压缩

为了缓解高性能计算(HPC)系统中存储和访问数据的瓶颈，I/O库可以在数据传输时进行计算，例如HDFS过滤器。对于通常使用浮点数据的科学应用程序，错误有界有损压缩方法是显着减少存储和带宽需求的关键技术。到目前为止，决定何时何地安排传输中的数据转换(比如压缩)已经超出了I/O库的范围。在本文中，我们介绍了Runway，这是一个运行时框架，可以通过对象存储抽象对传输中的数据进行计算。跑道被设计为可扩展的，以便在运行时执行用户定义的函数。在这项工作中，我们重点研究基于延迟和吞吐量将数据压缩操作卸载到可用处理单元的方法。我们比较了在多核cpu上运行压缩以及将其卸载到GPU和数据处理单元(DPU)上的性能。我们实现了一种最先进的错误有界有损压缩算法SZ3，作为跑道函数，其变体针对dpu进行了优化。我们提出了动态建模来指导传输数据压缩的调度决策。我们使用来自NERSC Perlmutter超级计算机上的SDRBench基准测试套件的四个科学数据集来评估Runway。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)

自引率

0.00%

发文量