Data deduplication in a hybrid architecture for improving write performance

Chao Chen, Jonathan Bastnagel, Yong Chen
{"title":"Data deduplication in a hybrid architecture for improving write performance","authors":"Chao Chen, Jonathan Bastnagel, Yong Chen","doi":"10.1145/2491661.2481435","DOIUrl":null,"url":null,"abstract":"Big Data computing provides a promising new opportunity for scientific discoveries and innovations. However, it also poses a significant challenge to the high-end computing community. An effective I/O solution is urgently required to support big data applications run on high-end computing systems. In this study, we propose a new approach namely DDiHA, Data Deduplication in Hybrid Architecture, to improve the write performance for write-intensive big data applications. The DDiHA approach utilizes data deduplications to reduce the size of data volumes before they are transfered and written to the storage. A hybrid architecture is introduced to facilitate data deduplications. Both theoretical study and prototyping verification were conducted to evaluate the DDiHA approach. The initial results have shown that, given the same compute resources, the DDiHA system outperformed the conventional architecture, even though it introduces additional computation workload from data deduplications. The DDiHA approach reduces the data size transferred across the network and improves the I/O system performance. It has a promising potential for write-intensive big data applications.","PeriodicalId":335825,"journal":{"name":"International Workshop on Runtime and Operating Systems for Supercomputers","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Runtime and Operating Systems for Supercomputers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2491661.2481435","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Big Data computing provides a promising new opportunity for scientific discoveries and innovations. However, it also poses a significant challenge to the high-end computing community. An effective I/O solution is urgently required to support big data applications run on high-end computing systems. In this study, we propose a new approach namely DDiHA, Data Deduplication in Hybrid Architecture, to improve the write performance for write-intensive big data applications. The DDiHA approach utilizes data deduplications to reduce the size of data volumes before they are transfered and written to the storage. A hybrid architecture is introduced to facilitate data deduplications. Both theoretical study and prototyping verification were conducted to evaluate the DDiHA approach. The initial results have shown that, given the same compute resources, the DDiHA system outperformed the conventional architecture, even though it introduces additional computation workload from data deduplications. The DDiHA approach reduces the data size transferred across the network and improves the I/O system performance. It has a promising potential for write-intensive big data applications.
混合架构下的重复数据删除,提高写性能
大数据计算为科学发现和创新提供了一个有希望的新机会。然而,它也对高端计算社区提出了重大挑战。为了支持在高端计算系统上运行的大数据应用,迫切需要一种有效的I/O解决方案。在本研究中,我们提出了一种新的方法,即DDiHA (Data Deduplication In Hybrid Architecture),以提高写密集型大数据应用的写性能。DDiHA方法利用重复数据删除在数据卷被传输和写入存储之前减小它们的大小。引入混合架构,方便重复数据删除。对DDiHA方法进行了理论研究和原型验证。初步结果表明,在计算资源相同的情况下,DDiHA系统的性能优于传统体系结构,尽管它引入了数据重复删除带来的额外计算工作量。DDiHA方法减少了通过网络传输的数据大小,提高了I/O系统性能。它在写密集型大数据应用中具有很大的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信