Giacomo Grangia, Quanqing Xu, A. Bianco, P. Giaccone
{"title":"Balancing the Storage in a Deduplication Cluster","authors":"Giacomo Grangia, Quanqing Xu, A. Bianco, P. Giaccone","doi":"10.1109/NAS.2017.8026846","DOIUrl":null,"url":null,"abstract":"We consider an in-line data deduplication system to backup data from many clients in a cluster of storage servers. We propose a centralized synchronous approach, denoted as GateD, that orchestrates the deduplication operations. According to GateD, the deduplication requests from multiple clients are gathered in a time window and then processed all together. This allows the centralized controller to exploit a higher space of solutions to allocate the data to the deduplication nodes in order to balance the storage occupancy across the nodes, with a beneficial effects on the final performance perceived at the clients and without sacrificing the deduplication efficiency. We investigate the performance through a detailed simulation model applied to real deduplication traces and show that GateD outperforms other state-of-art deduplication schemes.","PeriodicalId":222161,"journal":{"name":"2017 International Conference on Networking, Architecture, and Storage (NAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Networking, Architecture, and Storage (NAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NAS.2017.8026846","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
We consider an in-line data deduplication system to backup data from many clients in a cluster of storage servers. We propose a centralized synchronous approach, denoted as GateD, that orchestrates the deduplication operations. According to GateD, the deduplication requests from multiple clients are gathered in a time window and then processed all together. This allows the centralized controller to exploit a higher space of solutions to allocate the data to the deduplication nodes in order to balance the storage occupancy across the nodes, with a beneficial effects on the final performance perceived at the clients and without sacrificing the deduplication efficiency. We investigate the performance through a detailed simulation model applied to real deduplication traces and show that GateD outperforms other state-of-art deduplication schemes.