Geyao Cheng;Junxu Xia;Lailong Luo;Haibo Mi;Deke Guo;Richard T. B. Ma
{"title":"HyperPart:基于超图的复制存储系统抽象","authors":"Geyao Cheng;Junxu Xia;Lailong Luo;Haibo Mi;Deke Guo;Richard T. B. Ma","doi":"10.1109/TCC.2024.3502464","DOIUrl":null,"url":null,"abstract":"Currently, deduplication techniques are utilized to minimize the space overhead by deleting redundant data blocks across large-scale servers in data centers. However, such a process exacerbates the fragmentation of data blocks, causing more cross-server file retrievals with plummeting retrieval throughput. Some attempts prefer better file retrieval performance by confining all blocks of a file to one single server, resulting in non-trivial space consumption for more replicated blocks across servers. An ideal network storage system, in effect, should take both the deduplication and retrieval performance into account by implementing reasonable assignment of the detected unique blocks. Such a fine-grained assignment requires an accurate and comprehensive abstraction of the files, blocks, and the file-block affiliation relationships. To achieve this, we innovatively design the weighted hypergraph to profile the multivariate data correlations. With this delicate abstraction in place, we propose HyperPart, which elegantly transforms this complex block allocation problem into a hypergraph partition problem. For more general scenarios with dynamic file updates, we further propose a two-phase incremental hypergraph repartition scheme, which mitigates the performance degradation with minimal migration volume. We implement a prototype system of HyperPart, and the experiment results validate that it saves around 50% of the storage space and improves the retrieval throughput by approximately 30% of state-of-the-art methods under the balance constraints.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"13 1","pages":"46-60"},"PeriodicalIF":5.3000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HyperPart: A Hypergraph-Based Abstraction for Deduplicated Storage Systems\",\"authors\":\"Geyao Cheng;Junxu Xia;Lailong Luo;Haibo Mi;Deke Guo;Richard T. B. Ma\",\"doi\":\"10.1109/TCC.2024.3502464\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Currently, deduplication techniques are utilized to minimize the space overhead by deleting redundant data blocks across large-scale servers in data centers. However, such a process exacerbates the fragmentation of data blocks, causing more cross-server file retrievals with plummeting retrieval throughput. Some attempts prefer better file retrieval performance by confining all blocks of a file to one single server, resulting in non-trivial space consumption for more replicated blocks across servers. An ideal network storage system, in effect, should take both the deduplication and retrieval performance into account by implementing reasonable assignment of the detected unique blocks. Such a fine-grained assignment requires an accurate and comprehensive abstraction of the files, blocks, and the file-block affiliation relationships. To achieve this, we innovatively design the weighted hypergraph to profile the multivariate data correlations. With this delicate abstraction in place, we propose HyperPart, which elegantly transforms this complex block allocation problem into a hypergraph partition problem. For more general scenarios with dynamic file updates, we further propose a two-phase incremental hypergraph repartition scheme, which mitigates the performance degradation with minimal migration volume. We implement a prototype system of HyperPart, and the experiment results validate that it saves around 50% of the storage space and improves the retrieval throughput by approximately 30% of state-of-the-art methods under the balance constraints.\",\"PeriodicalId\":13202,\"journal\":{\"name\":\"IEEE Transactions on Cloud Computing\",\"volume\":\"13 1\",\"pages\":\"46-60\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2024-11-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Cloud Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10758297/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cloud Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10758297/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
HyperPart: A Hypergraph-Based Abstraction for Deduplicated Storage Systems
Currently, deduplication techniques are utilized to minimize the space overhead by deleting redundant data blocks across large-scale servers in data centers. However, such a process exacerbates the fragmentation of data blocks, causing more cross-server file retrievals with plummeting retrieval throughput. Some attempts prefer better file retrieval performance by confining all blocks of a file to one single server, resulting in non-trivial space consumption for more replicated blocks across servers. An ideal network storage system, in effect, should take both the deduplication and retrieval performance into account by implementing reasonable assignment of the detected unique blocks. Such a fine-grained assignment requires an accurate and comprehensive abstraction of the files, blocks, and the file-block affiliation relationships. To achieve this, we innovatively design the weighted hypergraph to profile the multivariate data correlations. With this delicate abstraction in place, we propose HyperPart, which elegantly transforms this complex block allocation problem into a hypergraph partition problem. For more general scenarios with dynamic file updates, we further propose a two-phase incremental hypergraph repartition scheme, which mitigates the performance degradation with minimal migration volume. We implement a prototype system of HyperPart, and the experiment results validate that it saves around 50% of the storage space and improves the retrieval throughput by approximately 30% of state-of-the-art methods under the balance constraints.
期刊介绍:
The IEEE Transactions on Cloud Computing (TCC) is dedicated to the multidisciplinary field of cloud computing. It is committed to the publication of articles that present innovative research ideas, application results, and case studies in cloud computing, focusing on key technical issues related to theory, algorithms, systems, applications, and performance.