Filesystem Aware Scalable I/O Framework for Data-Intensive Parallel Applications

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum Pub Date : 2013-05-20 DOI:10.1109/IPDPSW.2013.196

Rengan Xu, M. Araya-Polo, B. Chapman

{"title":"Filesystem Aware Scalable I/O Framework for Data-Intensive Parallel Applications","authors":"Rengan Xu, M. Araya-Polo, B. Chapman","doi":"10.1109/IPDPSW.2013.196","DOIUrl":null,"url":null,"abstract":"The growing speed gap between CPU and memory makes I/O the main bottleneck of many industrial applications. Some applications need to perform I/O operations for very large volume of data frequently, which will harm the performance seriously. This work's motivation are geophysical applications used for oil and gas exploration. These applications process Terabyte size datasets in HPC facilities. The datasets represent subsurface models and field recorded data. In general term, these applications read as inputs and write as intermediate/final results huge amount of data, where the underlying algorithms implement seismic imaging techniques. The traditional sequential I/O, even when couple with advance storage systems, cannot complete all I/O operations for so large volumes of data in an acceptable time range. Parallel I/O is the general strategy to solve such problems. However, because of the dynamic property of many of these applications, each parallel process does not know the data size it needs to write until its computation is done, and it also cannot identify the position in the file to write. In order to write correctly and efficiently, communication and synchronization are required among all processes to fully exploit the parallel I/O paradigm. To tackle these issues, we use a dynamic load balancing framework that is general enough for most of these applications. And to reduce the expensive synchronization and communication overhead, we introduced a I/O node that only handles I/O request and let compute nodes perform I/O operations in parallel. By using both POSIX I/O and memory-mapping interfaces, the experiment indicates that our approach is scalable. For instance, with 16 processes, the bandwidth of parallel reading can reach the theoretical peak performance (2.5 GB/s) of the storage infrastructure. Also, the parallel writing can be up to 4.68x (speedup, POSIX I/O) and 7.23x (speedup, memory-mapping) more efficient than the serial I/O implementation. Since, most geophysical applications are I/O bounded, these results positively impact the overall performance of the application, and confirm the chosen strategy as path to follow.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2013.196","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

The growing speed gap between CPU and memory makes I/O the main bottleneck of many industrial applications. Some applications need to perform I/O operations for very large volume of data frequently, which will harm the performance seriously. This work's motivation are geophysical applications used for oil and gas exploration. These applications process Terabyte size datasets in HPC facilities. The datasets represent subsurface models and field recorded data. In general term, these applications read as inputs and write as intermediate/final results huge amount of data, where the underlying algorithms implement seismic imaging techniques. The traditional sequential I/O, even when couple with advance storage systems, cannot complete all I/O operations for so large volumes of data in an acceptable time range. Parallel I/O is the general strategy to solve such problems. However, because of the dynamic property of many of these applications, each parallel process does not know the data size it needs to write until its computation is done, and it also cannot identify the position in the file to write. In order to write correctly and efficiently, communication and synchronization are required among all processes to fully exploit the parallel I/O paradigm. To tackle these issues, we use a dynamic load balancing framework that is general enough for most of these applications. And to reduce the expensive synchronization and communication overhead, we introduced a I/O node that only handles I/O request and let compute nodes perform I/O operations in parallel. By using both POSIX I/O and memory-mapping interfaces, the experiment indicates that our approach is scalable. For instance, with 16 processes, the bandwidth of parallel reading can reach the theoretical peak performance (2.5 GB/s) of the storage infrastructure. Also, the parallel writing can be up to 4.68x (speedup, POSIX I/O) and 7.23x (speedup, memory-mapping) more efficient than the serial I/O implementation. Since, most geophysical applications are I/O bounded, these results positively impact the overall performance of the application, and confirm the chosen strategy as path to follow.

查看原文本刊更多论文

面向数据密集型并行应用的文件系统感知可扩展I/O框架

CPU和内存之间越来越大的速度差距使得I/O成为许多工业应用的主要瓶颈。有些应用程序需要频繁地对非常大的数据量进行I/O操作，这会严重影响性能。这项工作的动机是将地球物理应用于石油和天然气勘探。这些应用程序在HPC设施中处理tb大小的数据集。这些数据集代表地下模型和现场记录数据。一般来说，这些应用程序读取作为输入，写入作为中间/最终结果的大量数据，其中底层算法实现了地震成像技术。传统的顺序I/O，即使与先进的存储系统结合使用，也无法在可接受的时间范围内完成如此大量数据的所有I/O操作。并行I/O是解决此类问题的一般策略。然而，由于许多这些应用程序的动态特性，每个并行进程在计算完成之前并不知道它需要写入的数据大小，而且它也无法确定要写入的文件中的位置。为了正确有效地写入，需要在所有进程之间进行通信和同步，以充分利用并行I/O范式。为了解决这些问题，我们使用了一个动态负载平衡框架，它对大多数应用程序来说都足够通用。为了减少昂贵的同步和通信开销，我们引入了一个I/O节点，它只处理I/O请求，让计算节点并行执行I/O操作。通过使用POSIX I/O和内存映射接口，实验表明我们的方法是可扩展的。例如，在16个进程时，并行读取的带宽可以达到存储基础设施的理论峰值性能(2.5 GB/s)。此外，与串行I/O实现相比，并行写入的效率最高可达4.68倍(加速，POSIX I/O)和7.23倍(加速，内存映射)。由于大多数地球物理应用程序都有I/O限制，因此这些结果会对应用程序的整体性能产生积极影响，并确认所选择的策略是遵循的路径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum

自引率

0.00%

发文量