Giacomo Grangia, Quanqing Xu, A. Bianco, P. Giaccone
{"title":"重复数据删除集群存储均衡","authors":"Giacomo Grangia, Quanqing Xu, A. Bianco, P. Giaccone","doi":"10.1109/NAS.2017.8026846","DOIUrl":null,"url":null,"abstract":"We consider an in-line data deduplication system to backup data from many clients in a cluster of storage servers. We propose a centralized synchronous approach, denoted as GateD, that orchestrates the deduplication operations. According to GateD, the deduplication requests from multiple clients are gathered in a time window and then processed all together. This allows the centralized controller to exploit a higher space of solutions to allocate the data to the deduplication nodes in order to balance the storage occupancy across the nodes, with a beneficial effects on the final performance perceived at the clients and without sacrificing the deduplication efficiency. We investigate the performance through a detailed simulation model applied to real deduplication traces and show that GateD outperforms other state-of-art deduplication schemes.","PeriodicalId":222161,"journal":{"name":"2017 International Conference on Networking, Architecture, and Storage (NAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Balancing the Storage in a Deduplication Cluster\",\"authors\":\"Giacomo Grangia, Quanqing Xu, A. Bianco, P. Giaccone\",\"doi\":\"10.1109/NAS.2017.8026846\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider an in-line data deduplication system to backup data from many clients in a cluster of storage servers. We propose a centralized synchronous approach, denoted as GateD, that orchestrates the deduplication operations. According to GateD, the deduplication requests from multiple clients are gathered in a time window and then processed all together. This allows the centralized controller to exploit a higher space of solutions to allocate the data to the deduplication nodes in order to balance the storage occupancy across the nodes, with a beneficial effects on the final performance perceived at the clients and without sacrificing the deduplication efficiency. We investigate the performance through a detailed simulation model applied to real deduplication traces and show that GateD outperforms other state-of-art deduplication schemes.\",\"PeriodicalId\":222161,\"journal\":{\"name\":\"2017 International Conference on Networking, Architecture, and Storage (NAS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on Networking, Architecture, and Storage (NAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NAS.2017.8026846\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Networking, Architecture, and Storage (NAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NAS.2017.8026846","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
We consider an in-line data deduplication system to backup data from many clients in a cluster of storage servers. We propose a centralized synchronous approach, denoted as GateD, that orchestrates the deduplication operations. According to GateD, the deduplication requests from multiple clients are gathered in a time window and then processed all together. This allows the centralized controller to exploit a higher space of solutions to allocate the data to the deduplication nodes in order to balance the storage occupancy across the nodes, with a beneficial effects on the final performance perceived at the clients and without sacrificing the deduplication efficiency. We investigate the performance through a detailed simulation model applied to real deduplication traces and show that GateD outperforms other state-of-art deduplication schemes.