A data streaming model in MPI

Workshop on Exascale MPI Pub Date : 2015-11-15 DOI:10.1145/2831129.2831131

I. Peng, S. Markidis, E. Laure, Daniel J. Holmes, Mark Bull

{"title":"A data streaming model in MPI","authors":"I. Peng, S. Markidis, E. Laure, Daniel J. Holmes, Mark Bull","doi":"10.1145/2831129.2831131","DOIUrl":null,"url":null,"abstract":"Data streaming model is an effective way to tackle the challenge of data-intensive applications. As traditional HPC applications generate large volume of data and more data-intensive applications move to HPC infrastructures, it is necessary to investigate the feasibility of combining message-passing and streaming programming models. MPI, the de facto standard for programming on HPC, cannot intuitively express the communication pattern and the functional operations required in streaming models. In this work, we designed and implemented a data streaming library MPIStream atop MPI to allocate data producers and consumers, to stream data continuously or irregularly and to process data at run-time. In the same spirit as the STREAM benchmark, we developed a parallel stream benchmark to measure data processing rate. The performance of the library largely depends on the size of the stream element, the number of data producers and consumers and the computational intensity of processing one stream element. With 2,048 data producers and 2,048 data consumers in the parallel benchmark, MPIStream achieved 200 GB/s processing rate on a Blue Gene/Q supercomputer. We illustrate that a streaming library for HPC applications can effectively enable irregular parallel I/O, application monitoring and threshold collective operations.","PeriodicalId":417011,"journal":{"name":"Workshop on Exascale MPI","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Exascale MPI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2831129.2831131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

Abstract

Data streaming model is an effective way to tackle the challenge of data-intensive applications. As traditional HPC applications generate large volume of data and more data-intensive applications move to HPC infrastructures, it is necessary to investigate the feasibility of combining message-passing and streaming programming models. MPI, the de facto standard for programming on HPC, cannot intuitively express the communication pattern and the functional operations required in streaming models. In this work, we designed and implemented a data streaming library MPIStream atop MPI to allocate data producers and consumers, to stream data continuously or irregularly and to process data at run-time. In the same spirit as the STREAM benchmark, we developed a parallel stream benchmark to measure data processing rate. The performance of the library largely depends on the size of the stream element, the number of data producers and consumers and the computational intensity of processing one stream element. With 2,048 data producers and 2,048 data consumers in the parallel benchmark, MPIStream achieved 200 GB/s processing rate on a Blue Gene/Q supercomputer. We illustrate that a streaming library for HPC applications can effectively enable irregular parallel I/O, application monitoring and threshold collective operations.

查看原文本刊更多论文

MPI中的数据流模型

数据流模型是解决数据密集型应用挑战的有效方法。随着传统HPC应用产生大量数据，越来越多的数据密集型应用转向HPC基础设施，有必要研究将消息传递和流编程模型结合起来的可行性。MPI作为HPC编程的事实标准，并不能直观地表达流模型所需的通信模式和功能操作。在这项工作中，我们在MPI之上设计并实现了一个数据流库MPIStream，用于分配数据生产者和消费者，连续或不定期地流数据，并在运行时处理数据。本着与STREAM基准相同的精神，我们开发了一个并行流基准来测量数据处理速率。库的性能在很大程度上取决于流元素的大小、数据生产者和消费者的数量以及处理一个流元素的计算强度。在并行基准测试中，MPIStream拥有2,048个数据生产者和2,048个数据消费者，在Blue Gene/Q超级计算机上实现了200gb /s的处理速率。我们说明了一个用于HPC应用程序的流库可以有效地实现不规则并行I/O，应用程序监控和阈值集合操作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Workshop on Exascale MPI

自引率

0.00%

发文量