Hai Zhou;Dan Feng;Yuchong Hu;Wei Wang;Huadong Huang
{"title":"Fast Garbage Collection in Erasure-Coded Storage Clusters","authors":"Hai Zhou;Dan Feng;Yuchong Hu;Wei Wang;Huadong Huang","doi":"10.1109/TC.2025.3575914","DOIUrl":null,"url":null,"abstract":"<italic>Erasure codes</i> (EC) have been widely adopted to provide high data reliability with low storage costs in clusters. Due to the deletion and out-of-place update operations, some data blocks are invalid, which unfortunately arouses the tedious <italic>garbage collection</i> (GC) problem. Several limitations still plague existing designs: substantial network traffic, unbalanced traffic load, and low read/write performance after GC. This paper proposes FastGC, a fast garbage collection method that merges the old stripes into a new stripe and reclaims invalid blocks. FastGC quickly generates an efficient merge solution by stripe grouping and bit sequences operations to minimize network traffic and maintains data block distributions of the same stripe to ensure read performance. It carefully allocates the storage space for new stripes during merging to eliminate the discontinuous free spaces that affect write performance. Furthermore, to accelerate the parity updates after merging, FastGC greedily schedules the transmission links for multi-stripe updates to balance the traffic load across nodes and adopts a maximum flow algorithm to saturate the bandwidth utilization. Comprehensive evaluation results show via simulations and Alibaba ECS experiments that FastGC can significantly reduce 10.36%-81.22% of the network traffic and 34.25%-72.36% of the GC time while maintaining read/write performance after GC.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 8","pages":"2827-2840"},"PeriodicalIF":3.8000,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11022771/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Erasure codes (EC) have been widely adopted to provide high data reliability with low storage costs in clusters. Due to the deletion and out-of-place update operations, some data blocks are invalid, which unfortunately arouses the tedious garbage collection (GC) problem. Several limitations still plague existing designs: substantial network traffic, unbalanced traffic load, and low read/write performance after GC. This paper proposes FastGC, a fast garbage collection method that merges the old stripes into a new stripe and reclaims invalid blocks. FastGC quickly generates an efficient merge solution by stripe grouping and bit sequences operations to minimize network traffic and maintains data block distributions of the same stripe to ensure read performance. It carefully allocates the storage space for new stripes during merging to eliminate the discontinuous free spaces that affect write performance. Furthermore, to accelerate the parity updates after merging, FastGC greedily schedules the transmission links for multi-stripe updates to balance the traffic load across nodes and adopts a maximum flow algorithm to saturate the bandwidth utilization. Comprehensive evaluation results show via simulations and Alibaba ECS experiments that FastGC can significantly reduce 10.36%-81.22% of the network traffic and 34.25%-72.36% of the GC time while maintaining read/write performance after GC.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.