Predicting and Comparing the Performance of Array Management Libraries.

Proceedings. IPDPS (Conference) Pub Date : 2020-05-01 Epub Date: 2020-07-14 DOI:10.1109/ipdps47924.2020.00097

Donghe Kang, Oliver Rübel, Suren Byna, Spyros Blanas

{"title":"Predicting and Comparing the Performance of Array Management Libraries.","authors":"Donghe Kang, Oliver Rübel, Suren Byna, Spyros Blanas","doi":"10.1109/ipdps47924.2020.00097","DOIUrl":null,"url":null,"abstract":"<p><p>Many applications are increasingly becoming I/O-bound. To improve scalability, analytical models of parallel I/O performance are often consulted to determine possible I/O optimizations. However, I/O performance modeling has predominantly focused on applications that directly issue I/O requests to a parallel file system or a local storage device. These I/O models are not directly usable by applications that access data through standardized I/O libraries, such as HDF5, FITS, and NetCDF, because a single I/O request to an object can trigger a cascade of I/O operations to different storage blocks. The I/O performance characteristics of applications that rely on these libraries is a complex function of the underlying data storage model, user-configurable parameters and object-level access patterns. As a consequence, I/O optimization is predominantly an ad-hoc process that is performed by application developers, who are often domain scientists with limited desire to delve into nuances of the storage hierarchy of modern computers. This paper presents an analytical cost model to predict the end-to-end execution time of applications that perform I/O through established array management libraries. The paper focuses on the HDF5 and Zarr array libraries, as examples of I/O libraries with radically different storage models: HDF5 stores every object in one file, while Zarr creates multiple files to store different objects. We find that accessing array objects via these I/O libraries introduces new overheads and optimizations. Specifically, in addition to I/O time, it is crucial to model the cost of transforming data to a particular storage layout (memory copy cost), as well as model the benefit of accessing a software cache. We evaluate the model on real applications that process observations (neuroscience) and simulation results (plasma physics). The evaluation on three HPC clusters reveals that I/O accounts for as little as 10% of the execution time in some cases, and hence models that only focus on I/O performance cannot accurately capture the performance of applications that use standard array storage libraries. In parallel experiments, our model correctly predicts the fastest storage library between HDF5 and Zarr 94% of the time, in contrast with 70% of the time for a cutting-edge I/O model.</p>","PeriodicalId":89233,"journal":{"name":"Proceedings. IPDPS (Conference)","volume":"2020 ","pages":"906-915"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/ipdps47924.2020.00097","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IPDPS (Conference)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ipdps47924.2020.00097","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/7/14 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Many applications are increasingly becoming I/O-bound. To improve scalability, analytical models of parallel I/O performance are often consulted to determine possible I/O optimizations. However, I/O performance modeling has predominantly focused on applications that directly issue I/O requests to a parallel file system or a local storage device. These I/O models are not directly usable by applications that access data through standardized I/O libraries, such as HDF5, FITS, and NetCDF, because a single I/O request to an object can trigger a cascade of I/O operations to different storage blocks. The I/O performance characteristics of applications that rely on these libraries is a complex function of the underlying data storage model, user-configurable parameters and object-level access patterns. As a consequence, I/O optimization is predominantly an ad-hoc process that is performed by application developers, who are often domain scientists with limited desire to delve into nuances of the storage hierarchy of modern computers. This paper presents an analytical cost model to predict the end-to-end execution time of applications that perform I/O through established array management libraries. The paper focuses on the HDF5 and Zarr array libraries, as examples of I/O libraries with radically different storage models: HDF5 stores every object in one file, while Zarr creates multiple files to store different objects. We find that accessing array objects via these I/O libraries introduces new overheads and optimizations. Specifically, in addition to I/O time, it is crucial to model the cost of transforming data to a particular storage layout (memory copy cost), as well as model the benefit of accessing a software cache. We evaluate the model on real applications that process observations (neuroscience) and simulation results (plasma physics). The evaluation on three HPC clusters reveals that I/O accounts for as little as 10% of the execution time in some cases, and hence models that only focus on I/O performance cannot accurately capture the performance of applications that use standard array storage libraries. In parallel experiments, our model correctly predicts the fastest storage library between HDF5 and Zarr 94% of the time, in contrast with 70% of the time for a cutting-edge I/O model.

查看原文本刊更多论文

预测和比较数组管理库的性能。

许多应用程序越来越受到I/ o限制。为了提高可伸缩性，通常会参考并行I/O性能的分析模型来确定可能的I/O优化。但是，I/O性能建模主要关注直接向并行文件系统或本地存储设备发出I/O请求的应用程序。通过标准化I/O库(如HDF5、FITS和NetCDF)访问数据的应用程序不能直接使用这些I/O模型，因为对对象的单个I/O请求可以触发对不同存储块的一连串I/O操作。依赖这些库的应用程序的I/O性能特征是底层数据存储模型、用户可配置参数和对象级访问模式的复杂函数。因此，I/O优化主要是一个由应用程序开发人员执行的临时过程，这些应用程序开发人员通常是领域科学家，对现代计算机存储层次结构的细微差别研究有限。本文提出了一个分析成本模型来预测通过建立的阵列管理库执行I/O的应用程序的端到端执行时间。本文主要关注HDF5和Zarr数组库，作为具有完全不同存储模型的I/O库的示例:HDF5将每个对象存储在一个文件中，而Zarr创建多个文件来存储不同的对象。我们发现通过这些I/O库访问数组对象会带来新的开销和优化。具体来说，除了I/O时间之外，对将数据转换为特定存储布局的成本(内存复制成本)进行建模以及对访问软件缓存的好处进行建模也很重要。我们在处理观察(神经科学)和模拟结果(等离子体物理)的实际应用中评估模型。对三个HPC集群的评估显示，在某些情况下，I/O只占执行时间的10%，因此，只关注I/O性能的模型无法准确捕获使用标准阵列存储库的应用程序的性能。在并行实验中，我们的模型在94%的时间内正确地预测HDF5和Zarr之间最快的存储库，而在先进的I/O模型中，这一比例为70%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. IPDPS (Conference)

自引率

0.00%

发文量