基于云和网格的数据密集型计算的编程抽象

Christopher Miceli, M. Miceli, S. Jha, Hartmut Kaiser, André Merzky
{"title":"基于云和网格的数据密集型计算的编程抽象","authors":"Christopher Miceli, M. Miceli, S. Jha, Hartmut Kaiser, André Merzky","doi":"10.1109/CCGRID.2009.87","DOIUrl":null,"url":null,"abstract":"MapReduce has emerged as an important data-parallel programming model for data-intensive computing – for Clouds and Grids. However most if not all implementations of MapReduce are coupled to a specific infrastructure. SAGA is a high-level programming interface which provides the ability to create distributed applications in an infrastructure independent way. In this paper, we show how MapReduce has been implemented using SAGA and demonstrate its interoperability across different distributed platforms – Grids, Cloud-like infrastructure and Clouds. We discuss the advantages of programmatically developing MapReduce using SAGA, by demonstrating that the SAGA-based implementation is infrastructure independent whilst still providing control over the deployment, distribution and runtime decomposition. The ability to control the distribution and placement of the computation units (workers) is critical in order to implement the ability to move computational work to the data. This is required to keep data network transfer low and in the case of commercial Clouds the monetary cost of computing the solution low. Using data-sets of size up to 10GB, and upto 10 workers, we provide detailed performance analysis of the SAGA-MapReduce implementation, and show how controllingthe distribution of computation and the payload per worker helps enhance performance.","PeriodicalId":118263,"journal":{"name":"2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":"{\"title\":\"Programming Abstractions for Data Intensive Computing on Clouds and Grids\",\"authors\":\"Christopher Miceli, M. Miceli, S. Jha, Hartmut Kaiser, André Merzky\",\"doi\":\"10.1109/CCGRID.2009.87\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"MapReduce has emerged as an important data-parallel programming model for data-intensive computing – for Clouds and Grids. However most if not all implementations of MapReduce are coupled to a specific infrastructure. SAGA is a high-level programming interface which provides the ability to create distributed applications in an infrastructure independent way. In this paper, we show how MapReduce has been implemented using SAGA and demonstrate its interoperability across different distributed platforms – Grids, Cloud-like infrastructure and Clouds. We discuss the advantages of programmatically developing MapReduce using SAGA, by demonstrating that the SAGA-based implementation is infrastructure independent whilst still providing control over the deployment, distribution and runtime decomposition. The ability to control the distribution and placement of the computation units (workers) is critical in order to implement the ability to move computational work to the data. This is required to keep data network transfer low and in the case of commercial Clouds the monetary cost of computing the solution low. Using data-sets of size up to 10GB, and upto 10 workers, we provide detailed performance analysis of the SAGA-MapReduce implementation, and show how controllingthe distribution of computation and the payload per worker helps enhance performance.\",\"PeriodicalId\":118263,\"journal\":{\"name\":\"2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-05-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"43\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGRID.2009.87\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2009.87","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 43

摘要

MapReduce已经成为一个重要的数据并行编程模型,用于数据密集型计算——云和网格。然而,MapReduce的大多数(如果不是全部的话)实现都与特定的基础设施相耦合。SAGA是一个高级编程接口,它提供了以独立于基础设施的方式创建分布式应用程序的能力。在本文中,我们展示了MapReduce是如何使用SAGA实现的,并演示了它在不同分布式平台(网格、类云基础设施和云)之间的互操作性。通过展示基于SAGA的实现是独立于基础设施的,同时仍然提供对部署、分发和运行时分解的控制,我们讨论了使用SAGA以编程方式开发MapReduce的优势。为了实现将计算工作移动到数据上的能力,控制计算单元(工作者)的分布和位置的能力至关重要。这是保持低数据网络传输所需的,并且在商业云的情况下,计算解决方案的货币成本较低。使用最大10GB的数据集和最多10个worker,我们提供了SAGA-MapReduce实现的详细性能分析,并展示了如何控制计算分布和每个worker的有效负载有助于提高性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Programming Abstractions for Data Intensive Computing on Clouds and Grids
MapReduce has emerged as an important data-parallel programming model for data-intensive computing – for Clouds and Grids. However most if not all implementations of MapReduce are coupled to a specific infrastructure. SAGA is a high-level programming interface which provides the ability to create distributed applications in an infrastructure independent way. In this paper, we show how MapReduce has been implemented using SAGA and demonstrate its interoperability across different distributed platforms – Grids, Cloud-like infrastructure and Clouds. We discuss the advantages of programmatically developing MapReduce using SAGA, by demonstrating that the SAGA-based implementation is infrastructure independent whilst still providing control over the deployment, distribution and runtime decomposition. The ability to control the distribution and placement of the computation units (workers) is critical in order to implement the ability to move computational work to the data. This is required to keep data network transfer low and in the case of commercial Clouds the monetary cost of computing the solution low. Using data-sets of size up to 10GB, and upto 10 workers, we provide detailed performance analysis of the SAGA-MapReduce implementation, and show how controllingthe distribution of computation and the payload per worker helps enhance performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信