Improving Performance for Geo-Distributed Data Process in Wide-Area

2017 IEEE International Conference on Computer and Information Technology (CIT) Pub Date : 2017-08-01 DOI:10.1109/CIT.2017.48

Ge Zhang, Haozhan Wang, Zhongzhi Luan, Weiguo Wu, D. Qian

引用次数: 2

Abstract

Many organizations and end sensors produce massive data around the globe. To analyze the data as a whole, the traditional way is to copy all data to a central datacenter for analysis. This is neither practical nor efficient as the huge transfer data size and the limited network bandwidth. What's more, the data privacy may also matters. Instead of transferring data, we believe moving the computation to where the data is can be a better way to solve this problem. In this paper, we design an algorithm for geo-distributed big data process which is both data-aware and network-aware. Considering the computation's characteristics, we take advantage of data dependency to find out the data locality. And use the integer linear programming (ILP) to achieve network-aware. The implementation of our algorithm is on the top of Spark. We improve the performance of geo-distributed data process by 22% in our experiments.

查看原文本刊更多论文

提高广域地理分布式数据处理性能

许多组织和终端传感器在全球范围内产生大量数据。要将数据作为一个整体进行分析，传统的方法是将所有数据复制到一个中央数据中心进行分析。由于传输数据量巨大，网络带宽有限，这种方式既不实用也不高效。更重要的是，数据隐私可能也很重要。我们认为，将计算移到数据所在的位置，而不是传输数据，可能是解决这个问题的更好方法。本文设计了一种具有数据感知和网络感知的地理分布式大数据处理算法。考虑到计算的特点，我们利用数据依赖性来确定数据的局部性。并利用整数线性规划(ILP)实现网络感知。我们的算法是在Spark之上实现的。在实验中，我们将地理分布式数据处理的性能提高了22%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE International Conference on Computer and Information Technology (CIT)

自引率

0.00%

发文量