松散耦合千万亿级编程的集体IO模型的设计和评估

Zhao Zhang, Allan Espinosa, K. Iskra, I. Raicu, Ian T Foster, M. Wilde
{"title":"松散耦合千万亿级编程的集体IO模型的设计和评估","authors":"Zhao Zhang, Allan Espinosa, K. Iskra, I. Raicu, Ian T Foster, M. Wilde","doi":"10.1109/MTAGS.2008.4777908","DOIUrl":null,"url":null,"abstract":"Loosely coupled programming is a powerful paradigm for rapidly creating higher-level applications from scientific programs on petascale systems, typically using scripting languages. This paradigm is a form of many-task computing (MTC) which focuses on the passing of data between programs as ordinary files rather than messages. While it has the significant benefits of decoupling producer and consumer and allowing existing application programs to be executed in parallel with no recoding, its typical implementation using shared file systems places a high performance burden on the overall system and on the user who will analyze and consume the downstream data. Previous efforts have achieved great speedups with loosely coupled programs, but have done so with careful manual tuning of all shared file system access. In this work, we evaluate a prototype collective IO model for file-based MTC. The model enables efficient and easy distribution of input data files to computing nodes and gathering of output results from them. It eliminates the need for such manual tuning and makes the programming of large-scale clusters using a loosely coupled model easier. Our approach, inspired by in-memory approaches to collective operations for parallel programming, builds on fast local file systems to provide high-speed local file caches for parallel scripts, uses a broadcast approach to handle distribution of common input data, and uses efficient scatter/gather and caching techniques for input and output. We describe the design of the prototype model, its implementation on the Blue Gene/P supercomputer, and present preliminary measurements of its performance on synthetic benchmarks and on a large-scale molecular dynamics application.","PeriodicalId":278412,"journal":{"name":"2008 Workshop on Many-Task Computing on Grids and Supercomputers","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"46","resultStr":"{\"title\":\"Design and evaluation of a collective IO model for loosely coupled petascale programming\",\"authors\":\"Zhao Zhang, Allan Espinosa, K. Iskra, I. Raicu, Ian T Foster, M. Wilde\",\"doi\":\"10.1109/MTAGS.2008.4777908\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Loosely coupled programming is a powerful paradigm for rapidly creating higher-level applications from scientific programs on petascale systems, typically using scripting languages. This paradigm is a form of many-task computing (MTC) which focuses on the passing of data between programs as ordinary files rather than messages. While it has the significant benefits of decoupling producer and consumer and allowing existing application programs to be executed in parallel with no recoding, its typical implementation using shared file systems places a high performance burden on the overall system and on the user who will analyze and consume the downstream data. Previous efforts have achieved great speedups with loosely coupled programs, but have done so with careful manual tuning of all shared file system access. In this work, we evaluate a prototype collective IO model for file-based MTC. The model enables efficient and easy distribution of input data files to computing nodes and gathering of output results from them. It eliminates the need for such manual tuning and makes the programming of large-scale clusters using a loosely coupled model easier. Our approach, inspired by in-memory approaches to collective operations for parallel programming, builds on fast local file systems to provide high-speed local file caches for parallel scripts, uses a broadcast approach to handle distribution of common input data, and uses efficient scatter/gather and caching techniques for input and output. We describe the design of the prototype model, its implementation on the Blue Gene/P supercomputer, and present preliminary measurements of its performance on synthetic benchmarks and on a large-scale molecular dynamics application.\",\"PeriodicalId\":278412,\"journal\":{\"name\":\"2008 Workshop on Many-Task Computing on Grids and Supercomputers\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"46\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 Workshop on Many-Task Computing on Grids and Supercomputers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MTAGS.2008.4777908\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Workshop on Many-Task Computing on Grids and Supercomputers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MTAGS.2008.4777908","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 46

摘要

松耦合编程是一种强大的范例,用于从千万亿级系统上的科学程序快速创建高级应用程序,通常使用脚本语言。这种范式是多任务计算(MTC)的一种形式,它侧重于在程序之间以普通文件而不是消息的形式传递数据。虽然它具有分离生产者和消费者以及允许并行执行现有应用程序而无需重新编码的显著优点,但其使用共享文件系统的典型实现给整个系统以及将分析和使用下游数据的用户带来了很高的性能负担。以前的工作已经通过松耦合程序实现了很大的速度提升,但这是通过对所有共享文件系统访问进行仔细的手动调优实现的。在这项工作中,我们评估了基于文件的MTC的原型集体IO模型。该模型能够高效、方便地将输入数据文件分发到计算节点,并从中收集输出结果。它消除了这种手动调优的需要,并且使使用松耦合模型的大规模集群的编程变得更加容易。我们的方法受到内存中并行编程的集体操作方法的启发,建立在快速本地文件系统上,为并行脚本提供高速本地文件缓存,使用广播方法处理公共输入数据的分布,并使用高效的分散/收集和缓存技术进行输入和输出。我们描述了原型模型的设计,它在蓝色基因/P超级计算机上的实现,并在合成基准和大规模分子动力学应用上对其性能进行了初步测量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Design and evaluation of a collective IO model for loosely coupled petascale programming
Loosely coupled programming is a powerful paradigm for rapidly creating higher-level applications from scientific programs on petascale systems, typically using scripting languages. This paradigm is a form of many-task computing (MTC) which focuses on the passing of data between programs as ordinary files rather than messages. While it has the significant benefits of decoupling producer and consumer and allowing existing application programs to be executed in parallel with no recoding, its typical implementation using shared file systems places a high performance burden on the overall system and on the user who will analyze and consume the downstream data. Previous efforts have achieved great speedups with loosely coupled programs, but have done so with careful manual tuning of all shared file system access. In this work, we evaluate a prototype collective IO model for file-based MTC. The model enables efficient and easy distribution of input data files to computing nodes and gathering of output results from them. It eliminates the need for such manual tuning and makes the programming of large-scale clusters using a loosely coupled model easier. Our approach, inspired by in-memory approaches to collective operations for parallel programming, builds on fast local file systems to provide high-speed local file caches for parallel scripts, uses a broadcast approach to handle distribution of common input data, and uses efficient scatter/gather and caching techniques for input and output. We describe the design of the prototype model, its implementation on the Blue Gene/P supercomputer, and present preliminary measurements of its performance on synthetic benchmarks and on a large-scale molecular dynamics application.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信