Input/output APIs and data organization for high performance scientific computing

2008 3rd Petascale Data Storage Workshop Pub Date : 2008-11-01 DOI:10.1109/PDSW.2008.4811881

J. Lofstead, F. Zheng, S. Klasky, K. Schwan

{"title":"Input/output APIs and data organization for high performance scientific computing","authors":"J. Lofstead, F. Zheng, S. Klasky, K. Schwan","doi":"10.1109/PDSW.2008.4811881","DOIUrl":null,"url":null,"abstract":"Scientific Data Management has become essential to the productivity of scientists using ever larger machines and running applications that produce ever more data. There are several specific issues when running on petascale (and beyond) machines. One is the need for massively parallel data output, which in part, depends on the data formats and semantics being used. Here, the inhibition of parallelism by file system notions of strict and immediate consistency can be addressed with ldrdelayed data consistencypsila methods. Such methods can also be used to remove the runtime coordination steps required for immediate consistency from machine resources like Bluegene's separate networks for barrier calls and its dedicated IO nodes, thereby freeing them to instead, perform alternate tasks that enhance data output performance and/or richness. Second, once data is generated, it is important to be able to efficiently access it, which implies the need for rapid data characterization and indexing. This can be achieved by adding small amounts of metadata to the output process, thereby permitting scientists to quickly make informed decisions about which files to process from large-scale science runs. Third, failure probabilities increase with an increasing number of nodes, which suggests the need for organizing output data to be resilient to failures in which the output from a single or from a small number of nodes is lost or corrupted. This paper demonstrates the utility of using delayed consistency methods for the process of data output from the compute nodes of petascale machines. It also demonstrates the advantages derived from resilient data organization coupled with lightweight methods for data indexing. An implementation of these techniques is realized in ADIOS, the Adaptable IO System, and its BP intermediate file format. The implementation is designed to be compatible with existing, well-known file formats like HDF-5 and NetCDF, thereby permitting end users to exploit the rich tool chains for these formats. Initial performance evaluations of the approach exhibit substantial performance advantages over using native parallel HDF-5 in the Chimera supernova code.","PeriodicalId":227342,"journal":{"name":"2008 3rd Petascale Data Storage Workshop","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 3rd Petascale Data Storage Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDSW.2008.4811881","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

Scientific Data Management has become essential to the productivity of scientists using ever larger machines and running applications that produce ever more data. There are several specific issues when running on petascale (and beyond) machines. One is the need for massively parallel data output, which in part, depends on the data formats and semantics being used. Here, the inhibition of parallelism by file system notions of strict and immediate consistency can be addressed with ldrdelayed data consistencypsila methods. Such methods can also be used to remove the runtime coordination steps required for immediate consistency from machine resources like Bluegene's separate networks for barrier calls and its dedicated IO nodes, thereby freeing them to instead, perform alternate tasks that enhance data output performance and/or richness. Second, once data is generated, it is important to be able to efficiently access it, which implies the need for rapid data characterization and indexing. This can be achieved by adding small amounts of metadata to the output process, thereby permitting scientists to quickly make informed decisions about which files to process from large-scale science runs. Third, failure probabilities increase with an increasing number of nodes, which suggests the need for organizing output data to be resilient to failures in which the output from a single or from a small number of nodes is lost or corrupted. This paper demonstrates the utility of using delayed consistency methods for the process of data output from the compute nodes of petascale machines. It also demonstrates the advantages derived from resilient data organization coupled with lightweight methods for data indexing. An implementation of these techniques is realized in ADIOS, the Adaptable IO System, and its BP intermediate file format. The implementation is designed to be compatible with existing, well-known file formats like HDF-5 and NetCDF, thereby permitting end users to exploit the rich tool chains for these formats. Initial performance evaluations of the approach exhibit substantial performance advantages over using native parallel HDF-5 in the Chimera supernova code.

查看原文本刊更多论文

高性能科学计算的输入/输出api和数据组织

科学数据管理对于使用越来越大的机器和运行产生越来越多数据的应用程序的科学家的生产力已经变得至关重要。在千兆级(甚至更高)机器上运行时有几个特定的问题。一个是需要大规模并行数据输出，这在一定程度上取决于所使用的数据格式和语义。在这里，文件系统严格和即时一致性概念对并行性的抑制可以用ldrdelayed data consistencsila方法解决。这些方法还可以用于从机器资源中移除即时一致性所需的运行时协调步骤，例如Bluegene的屏障调用和专用IO节点的单独网络，从而释放它们来执行增强数据输出性能和/或丰富性的替代任务。其次，一旦生成数据，重要的是能够有效地访问它，这意味着需要快速的数据表征和索引。这可以通过向输出过程中添加少量元数据来实现，从而允许科学家快速做出明智的决定，决定从大规模科学运行中处理哪些文件。第三，故障概率随着节点数量的增加而增加，这表明需要组织输出数据以适应单个或少数节点的输出丢失或损坏的故障。本文演示了使用延迟一致性方法处理千万亿级计算机计算节点的数据输出过程的实用性。它还演示了弹性数据组织与轻量级数据索引方法相结合所带来的优势。这些技术在ADIOS (adaptive IO System)及其BP中间文件格式中实现。该实现旨在与现有的知名文件格式(如HDF-5和NetCDF)兼容，从而允许最终用户利用这些格式的丰富工具链。该方法的初步性能评估显示，与在Chimera超新星代码中使用本地并行HDF-5相比，该方法具有显著的性能优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 3rd Petascale Data Storage Workshop

自引率

0.00%

发文量