Improving Performance for Geo-Distributed Data Process in Wide-Area

Ge Zhang, Haozhan Wang, Zhongzhi Luan, Weiguo Wu, D. Qian
{"title":"Improving Performance for Geo-Distributed Data Process in Wide-Area","authors":"Ge Zhang, Haozhan Wang, Zhongzhi Luan, Weiguo Wu, D. Qian","doi":"10.1109/CIT.2017.48","DOIUrl":null,"url":null,"abstract":"Many organizations and end sensors produce massive data around the globe. To analyze the data as a whole, the traditional way is to copy all data to a central datacenter for analysis. This is neither practical nor efficient as the huge transfer data size and the limited network bandwidth. What's more, the data privacy may also matters. Instead of transferring data, we believe moving the computation to where the data is can be a better way to solve this problem. In this paper, we design an algorithm for geo-distributed big data process which is both data-aware and network-aware. Considering the computation's characteristics, we take advantage of data dependency to find out the data locality. And use the integer linear programming (ILP) to achieve network-aware. The implementation of our algorithm is on the top of Spark. We improve the performance of geo-distributed data process by 22% in our experiments.","PeriodicalId":378423,"journal":{"name":"2017 IEEE International Conference on Computer and Information Technology (CIT)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Computer and Information Technology (CIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIT.2017.48","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Many organizations and end sensors produce massive data around the globe. To analyze the data as a whole, the traditional way is to copy all data to a central datacenter for analysis. This is neither practical nor efficient as the huge transfer data size and the limited network bandwidth. What's more, the data privacy may also matters. Instead of transferring data, we believe moving the computation to where the data is can be a better way to solve this problem. In this paper, we design an algorithm for geo-distributed big data process which is both data-aware and network-aware. Considering the computation's characteristics, we take advantage of data dependency to find out the data locality. And use the integer linear programming (ILP) to achieve network-aware. The implementation of our algorithm is on the top of Spark. We improve the performance of geo-distributed data process by 22% in our experiments.
提高广域地理分布式数据处理性能
许多组织和终端传感器在全球范围内产生大量数据。要将数据作为一个整体进行分析,传统的方法是将所有数据复制到一个中央数据中心进行分析。由于传输数据量巨大,网络带宽有限,这种方式既不实用也不高效。更重要的是,数据隐私可能也很重要。我们认为,将计算移到数据所在的位置,而不是传输数据,可能是解决这个问题的更好方法。本文设计了一种具有数据感知和网络感知的地理分布式大数据处理算法。考虑到计算的特点,我们利用数据依赖性来确定数据的局部性。并利用整数线性规划(ILP)实现网络感知。我们的算法是在Spark之上实现的。在实验中,我们将地理分布式数据处理的性能提高了22%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信