Smart:一个类似mapreduce的框架，用于现场科学分析

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2015-11-15 DOI:10.1145/2807591.2807650

Yi Wang, G. Agrawal, Tekin Bicer, Wei Jiang

{"title":"Smart:一个类似mapreduce的框架，用于现场科学分析","authors":"Yi Wang, G. Agrawal, Tekin Bicer, Wei Jiang","doi":"10.1145/2807591.2807650","DOIUrl":null,"url":null,"abstract":"In-situ analytics has lately been shown to be an effective approach to reduce both I/O and storage costs for scientific analytics. Developing an efficient in-situ implementation, however, involves many challenges, including parallelization, data movement or sharing, and resource allocation. Based on the premise that MapReduce can be an appropriate API for specifying scientific analytics applications, we present a novel MapReduce-like framework that supports efficient in-situ scientific analytics, and address several challenges that arise in applying the MapReduce idea for in-situ processing. Specifically, our implementation can load simulated data directly from distributed memory, and it uses a modified API that helps meet the strict memory constraints of in-situ analytics. The framework is designed so that analytics can be launched from the parallel code region of a simulation program. We have developed both time sharing and space sharing modes for maximizing the performance in different scenarios, with the former even avoiding any copying of data from simulation to the analytics program. We demonstrate the functionality, efficiency, and scalability of our system, by using different simulation and analytics programs, executed on clusters with multi-core and many-core nodes.","PeriodicalId":117494,"journal":{"name":"SC15: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"330 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"50","resultStr":"{\"title\":\"Smart: a MapReduce-like framework for in-situ scientific analytics\",\"authors\":\"Yi Wang, G. Agrawal, Tekin Bicer, Wei Jiang\",\"doi\":\"10.1145/2807591.2807650\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In-situ analytics has lately been shown to be an effective approach to reduce both I/O and storage costs for scientific analytics. Developing an efficient in-situ implementation, however, involves many challenges, including parallelization, data movement or sharing, and resource allocation. Based on the premise that MapReduce can be an appropriate API for specifying scientific analytics applications, we present a novel MapReduce-like framework that supports efficient in-situ scientific analytics, and address several challenges that arise in applying the MapReduce idea for in-situ processing. Specifically, our implementation can load simulated data directly from distributed memory, and it uses a modified API that helps meet the strict memory constraints of in-situ analytics. The framework is designed so that analytics can be launched from the parallel code region of a simulation program. We have developed both time sharing and space sharing modes for maximizing the performance in different scenarios, with the former even avoiding any copying of data from simulation to the analytics program. We demonstrate the functionality, efficiency, and scalability of our system, by using different simulation and analytics programs, executed on clusters with multi-core and many-core nodes.\",\"PeriodicalId\":117494,\"journal\":{\"name\":\"SC15: International Conference for High Performance Computing, Networking, Storage and Analysis\",\"volume\":\"330 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"50\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SC15: International Conference for High Performance Computing, Networking, Storage and Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2807591.2807650\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SC15: International Conference for High Performance Computing, Networking, Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2807591.2807650","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 50

摘要

原位分析最近被证明是一种有效的方法，可以降低科学分析的I/O和存储成本。然而，开发高效的原位实现涉及许多挑战，包括并行化、数据移动或共享以及资源分配。基于MapReduce可以成为指定科学分析应用程序的合适API的前提，我们提出了一个新颖的类似MapReduce的框架，支持高效的原位科学分析，并解决了在将MapReduce思想应用于原位处理时出现的几个挑战。具体来说，我们的实现可以直接从分布式内存中加载模拟数据，并且它使用经过修改的API来帮助满足原位分析的严格内存限制。该框架的设计使分析可以从仿真程序的并行代码区域启动。我们已经开发了时间共享和空间共享模式，以在不同的场景中最大化性能，前者甚至避免了从模拟到分析程序的任何数据复制。通过使用不同的仿真和分析程序，在具有多核和多核节点的集群上执行，我们演示了系统的功能、效率和可扩展性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Smart: a MapReduce-like framework for in-situ scientific analytics

In-situ analytics has lately been shown to be an effective approach to reduce both I/O and storage costs for scientific analytics. Developing an efficient in-situ implementation, however, involves many challenges, including parallelization, data movement or sharing, and resource allocation. Based on the premise that MapReduce can be an appropriate API for specifying scientific analytics applications, we present a novel MapReduce-like framework that supports efficient in-situ scientific analytics, and address several challenges that arise in applying the MapReduce idea for in-situ processing. Specifically, our implementation can load simulated data directly from distributed memory, and it uses a modified API that helps meet the strict memory constraints of in-situ analytics. The framework is designed so that analytics can be launched from the parallel code region of a simulation program. We have developed both time sharing and space sharing modes for maximizing the performance in different scenarios, with the former even avoiding any copying of data from simulation to the analytics program. We demonstrate the functionality, efficiency, and scalability of our system, by using different simulation and analytics programs, executed on clusters with multi-core and many-core nodes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis

自引率

0.00%

发文量