Vincenzo Maisto , Alessandro Cilardo , Emilio Billi , Chuck Fader
{"title":"A hardware/software architecture for multi-threaded offloading of erasure codes in distributed file systems","authors":"Vincenzo Maisto , Alessandro Cilardo , Emilio Billi , Chuck Fader","doi":"10.1016/j.future.2025.108187","DOIUrl":null,"url":null,"abstract":"<div><div>Big Data analytics and cloud computing impose an ever-growing demand for data-center providers in terms of computational requirements, latency, and storage. Distributed file systems offer the strategic advantage of scaling-out computing and storage resources, hence allowing for notable speed-ups with massively parallel and distributed computing paradigms. On the other hand, such distributed clusters are constantly challenged with storage failures. Data replication is often deployed to ensure fault tolerance and business continuity, typically in a 3x configuration. This results in expensive 200 % overheads in storage space, write propagation, and energy costs. Erasure codes offer an alternative approach for fault tolerance by allowing reconstruction of erased data chunks, while reducing storage overhead down to 30 %. However, a considerable share of CPU cycles and energy is spent computing such codes, effectively reducing the cluster’s efficiency and starving other user and system tasks. Offloading on a custom accelerator is a non-trivial issue, due to the highly multi-threaded nature of such tasks and the lack of robust multi-threading support in conventional accelerator runtimes.</div><div>In this work, we present a heterogeneous hardware/software architectural design for large-scale and multi-threaded acceleration of distributed erasure codes on PCIe accelerators, and a new abstraction and integration model for distributed accelerators in fault-tolerant storage systems. We enable safe and seamless deployment of multi-threaded SYCL-based IP cores through a hardware thread proxying layer providing software thread-isolation, and integration with cluster-level middlewares. In addition, our design allows for heterogeneous cluster configurations, with full compatibility and transparent integration of heterogeneously-accelerated and CPU-only nodes. We systematically evaluate the individual layers of our architecture and validate design’s integration in a container-based HDFS cluster, comparing performance against the state-of-the-art AVX-512-accelerated ISA-L library and other SYCL substrates, such as GPUs and single-threaded FPGAs.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"176 ","pages":"Article 108187"},"PeriodicalIF":6.2000,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25004819","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Big Data analytics and cloud computing impose an ever-growing demand for data-center providers in terms of computational requirements, latency, and storage. Distributed file systems offer the strategic advantage of scaling-out computing and storage resources, hence allowing for notable speed-ups with massively parallel and distributed computing paradigms. On the other hand, such distributed clusters are constantly challenged with storage failures. Data replication is often deployed to ensure fault tolerance and business continuity, typically in a 3x configuration. This results in expensive 200 % overheads in storage space, write propagation, and energy costs. Erasure codes offer an alternative approach for fault tolerance by allowing reconstruction of erased data chunks, while reducing storage overhead down to 30 %. However, a considerable share of CPU cycles and energy is spent computing such codes, effectively reducing the cluster’s efficiency and starving other user and system tasks. Offloading on a custom accelerator is a non-trivial issue, due to the highly multi-threaded nature of such tasks and the lack of robust multi-threading support in conventional accelerator runtimes.
In this work, we present a heterogeneous hardware/software architectural design for large-scale and multi-threaded acceleration of distributed erasure codes on PCIe accelerators, and a new abstraction and integration model for distributed accelerators in fault-tolerant storage systems. We enable safe and seamless deployment of multi-threaded SYCL-based IP cores through a hardware thread proxying layer providing software thread-isolation, and integration with cluster-level middlewares. In addition, our design allows for heterogeneous cluster configurations, with full compatibility and transparent integration of heterogeneously-accelerated and CPU-only nodes. We systematically evaluate the individual layers of our architecture and validate design’s integration in a container-based HDFS cluster, comparing performance against the state-of-the-art AVX-512-accelerated ISA-L library and other SYCL substrates, such as GPUs and single-threaded FPGAs.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.