Moving the Code to the Data - Dynamic Code Deployment Using ActiveSpaces

C. Docan, M. Parashar, J. Cummings, S. Klasky
{"title":"Moving the Code to the Data - Dynamic Code Deployment Using ActiveSpaces","authors":"C. Docan, M. Parashar, J. Cummings, S. Klasky","doi":"10.1109/IPDPS.2011.120","DOIUrl":null,"url":null,"abstract":"Managing the large volumes of data produced by emerging scientific and engineering simulations running on leadership-class resources has become a critical challenge. The data has to be extracted off the computing nodes and transported to consumer nodes so that it can be processed, analyzed, visualized, archived, etc. Several recent research efforts have addressed data-related challenges at different levels. One attractive approach is to offload expensive I/O operations to a smaller set of dedicated computing nodes known as a staging area. However, even using this approach, the data still has to be moved from the staging area to consumer nodes for processing, which continues to be a bottleneck. In this paper, we investigate an alternate approach, namely moving the data-processing code to the staging area rather than moving the data. Specifically, we present the Active Spaces framework, which provides (1) programming support for defining the data-processing routines to be downloaded to the staging area, and (2) run-time mechanisms for transporting binary codes associated with these routines to the staging area, executing the routines on the nodes of the staging area, and returning the results. We also present an experimental performance evaluation of Active Spaces using applications running on the Cray XT5 at Oak Ridge National Laboratory. Finally, we use a coupled fusion application workflow to explore the trade-offs between transporting data and transporting the code required for data processing during coupling, and we characterize the sweet spots for each option.","PeriodicalId":355100,"journal":{"name":"2011 IEEE International Parallel & Distributed Processing Symposium","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Parallel & Distributed Processing Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2011.120","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 35

Abstract

Managing the large volumes of data produced by emerging scientific and engineering simulations running on leadership-class resources has become a critical challenge. The data has to be extracted off the computing nodes and transported to consumer nodes so that it can be processed, analyzed, visualized, archived, etc. Several recent research efforts have addressed data-related challenges at different levels. One attractive approach is to offload expensive I/O operations to a smaller set of dedicated computing nodes known as a staging area. However, even using this approach, the data still has to be moved from the staging area to consumer nodes for processing, which continues to be a bottleneck. In this paper, we investigate an alternate approach, namely moving the data-processing code to the staging area rather than moving the data. Specifically, we present the Active Spaces framework, which provides (1) programming support for defining the data-processing routines to be downloaded to the staging area, and (2) run-time mechanisms for transporting binary codes associated with these routines to the staging area, executing the routines on the nodes of the staging area, and returning the results. We also present an experimental performance evaluation of Active Spaces using applications running on the Cray XT5 at Oak Ridge National Laboratory. Finally, we use a coupled fusion application workflow to explore the trade-offs between transporting data and transporting the code required for data processing during coupling, and we characterize the sweet spots for each option.
将代码移动到数据-使用ActiveSpaces进行动态代码部署
管理由新兴科学和工程模拟产生的大量数据已经成为一个关键的挑战。数据必须从计算节点中提取出来并传输到消费者节点,以便对其进行处理、分析、可视化、存档等。最近的几项研究工作在不同层面上解决了与数据相关的挑战。一种有吸引力的方法是将昂贵的I/O操作卸载到一组较小的专用计算节点(称为暂放区)上。然而,即使使用这种方法,数据仍然必须从暂存区域移动到消费者节点进行处理,这仍然是一个瓶颈。在本文中,我们研究了另一种方法,即将数据处理代码移动到暂存区域,而不是移动数据。具体来说,我们提出了Active Spaces框架,它提供了(1)编程支持,用于定义要下载到登台区域的数据处理例程;(2)运行时机制,用于将与这些例程相关的二进制代码传输到登台区域,在登台区域的节点上执行例程,并返回结果。我们还介绍了在橡树岭国家实验室的Cray XT5上运行的应用程序对Active Spaces的实验性能评估。最后,我们使用一个耦合融合应用程序工作流来探索在耦合期间传输数据和传输数据处理所需的代码之间的权衡,并描述每个选项的最佳点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信