大数据分析服务的同步并行处理以优化联邦云中的性能

2012 IEEE Fifth International Conference on Cloud Computing Pub Date : 2012-06-24 DOI:10.1109/CLOUD.2012.108

Gueyoung Jung, N. Gnanasambandam, Tridib Mukherjee

{"title":"大数据分析服务的同步并行处理以优化联邦云中的性能","authors":"Gueyoung Jung, N. Gnanasambandam, Tridib Mukherjee","doi":"10.1109/CLOUD.2012.108","DOIUrl":null,"url":null,"abstract":"Parallelization of big-data analytics services over a federation of heterogeneous clouds has been considered to improve performance. However, contrary to common intuition, there is an inherent tradeoff between the level of parallelism and the performance for big-data analytics principally because of a significant delay for big-data to get transferred over the network. The data transfer delay can be comparable or even higher than the time required to compute data. To address the aforementioned tradeoff, this paper determines: (a) how many and which computing nodes in federated clouds should be used for parallel execution of big-data analytics; (b) opportunistic apportioning of big-data to these computing nodes in a way to enable synchronized completion at best-effort performance; and (c) sequence of apportioned, different sizes of big-data chunks to be computed in each node so that transfer of a chunk is overlapped as much as possible with the computation of the previous chunk in the node. In this regard, Maximally Overlapped Bin-packing driven Bursting (MOBB) algorithm is proposed, which improve the performance by up to 60% against existing approaches.","PeriodicalId":214084,"journal":{"name":"2012 IEEE Fifth International Conference on Cloud Computing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"68","resultStr":"{\"title\":\"Synchronous Parallel Processing of Big-Data Analytics Services to Optimize Performance in Federated Clouds\",\"authors\":\"Gueyoung Jung, N. Gnanasambandam, Tridib Mukherjee\",\"doi\":\"10.1109/CLOUD.2012.108\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Parallelization of big-data analytics services over a federation of heterogeneous clouds has been considered to improve performance. However, contrary to common intuition, there is an inherent tradeoff between the level of parallelism and the performance for big-data analytics principally because of a significant delay for big-data to get transferred over the network. The data transfer delay can be comparable or even higher than the time required to compute data. To address the aforementioned tradeoff, this paper determines: (a) how many and which computing nodes in federated clouds should be used for parallel execution of big-data analytics; (b) opportunistic apportioning of big-data to these computing nodes in a way to enable synchronized completion at best-effort performance; and (c) sequence of apportioned, different sizes of big-data chunks to be computed in each node so that transfer of a chunk is overlapped as much as possible with the computation of the previous chunk in the node. In this regard, Maximally Overlapped Bin-packing driven Bursting (MOBB) algorithm is proposed, which improve the performance by up to 60% against existing approaches.\",\"PeriodicalId\":214084,\"journal\":{\"name\":\"2012 IEEE Fifth International Conference on Cloud Computing\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"68\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE Fifth International Conference on Cloud Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLOUD.2012.108\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Fifth International Conference on Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLOUD.2012.108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 68

摘要

在异构云联盟上并行化大数据分析服务被认为可以提高性能。然而，与通常的直觉相反，大数据分析的并行性水平和性能之间存在固有的权衡，主要是因为大数据在网络上传输的显著延迟。数据传输延迟可以与计算数据所需的时间相当，甚至更高。为了解决上述权衡问题，本文确定:(a)联邦云中应该使用多少和哪些计算节点来并行执行大数据分析;(b)机会性地将大数据分配给这些计算节点，以实现以最佳性能同步完成;(c)在每个节点上计算分配的不同大小的大数据块的顺序，使一个块的传输与该节点上一个块的计算尽可能重叠。在这方面，提出了最大重叠盒包装驱动爆发(MOBB)算法，与现有方法相比，该算法的性能提高了60%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Synchronous Parallel Processing of Big-Data Analytics Services to Optimize Performance in Federated Clouds

Parallelization of big-data analytics services over a federation of heterogeneous clouds has been considered to improve performance. However, contrary to common intuition, there is an inherent tradeoff between the level of parallelism and the performance for big-data analytics principally because of a significant delay for big-data to get transferred over the network. The data transfer delay can be comparable or even higher than the time required to compute data. To address the aforementioned tradeoff, this paper determines: (a) how many and which computing nodes in federated clouds should be used for parallel execution of big-data analytics; (b) opportunistic apportioning of big-data to these computing nodes in a way to enable synchronized completion at best-effort performance; and (c) sequence of apportioned, different sizes of big-data chunks to be computed in each node so that transfer of a chunk is overlapped as much as possible with the computation of the previous chunk in the node. In this regard, Maximally Overlapped Bin-packing driven Bursting (MOBB) algorithm is proposed, which improve the performance by up to 60% against existing approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 IEEE Fifth International Conference on Cloud Computing

自引率

0.00%

发文量