A hardware/software architecture for multi-threaded offloading of erasure codes in distributed file systems

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-10-04 DOI:10.1016/j.future.2025.108187

Vincenzo Maisto , Alessandro Cilardo , Emilio Billi , Chuck Fader

{"title":"A hardware/software architecture for multi-threaded offloading of erasure codes in distributed file systems","authors":"Vincenzo Maisto , Alessandro Cilardo , Emilio Billi , Chuck Fader","doi":"10.1016/j.future.2025.108187","DOIUrl":null,"url":null,"abstract":"<div><div>Big Data analytics and cloud computing impose an ever-growing demand for data-center providers in terms of computational requirements, latency, and storage. Distributed file systems offer the strategic advantage of scaling-out computing and storage resources, hence allowing for notable speed-ups with massively parallel and distributed computing paradigms. On the other hand, such distributed clusters are constantly challenged with storage failures. Data replication is often deployed to ensure fault tolerance and business continuity, typically in a 3x configuration. This results in expensive 200 % overheads in storage space, write propagation, and energy costs. Erasure codes offer an alternative approach for fault tolerance by allowing reconstruction of erased data chunks, while reducing storage overhead down to 30 %. However, a considerable share of CPU cycles and energy is spent computing such codes, effectively reducing the cluster’s efficiency and starving other user and system tasks. Offloading on a custom accelerator is a non-trivial issue, due to the highly multi-threaded nature of such tasks and the lack of robust multi-threading support in conventional accelerator runtimes.</div><div>In this work, we present a heterogeneous hardware/software architectural design for large-scale and multi-threaded acceleration of distributed erasure codes on PCIe accelerators, and a new abstraction and integration model for distributed accelerators in fault-tolerant storage systems. We enable safe and seamless deployment of multi-threaded SYCL-based IP cores through a hardware thread proxying layer providing software thread-isolation, and integration with cluster-level middlewares. In addition, our design allows for heterogeneous cluster configurations, with full compatibility and transparent integration of heterogeneously-accelerated and CPU-only nodes. We systematically evaluate the individual layers of our architecture and validate design’s integration in a container-based HDFS cluster, comparing performance against the state-of-the-art AVX-512-accelerated ISA-L library and other SYCL substrates, such as GPUs and single-threaded FPGAs.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"176 ","pages":"Article 108187"},"PeriodicalIF":6.2000,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25004819","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Big Data analytics and cloud computing impose an ever-growing demand for data-center providers in terms of computational requirements, latency, and storage. Distributed file systems offer the strategic advantage of scaling-out computing and storage resources, hence allowing for notable speed-ups with massively parallel and distributed computing paradigms. On the other hand, such distributed clusters are constantly challenged with storage failures. Data replication is often deployed to ensure fault tolerance and business continuity, typically in a 3x configuration. This results in expensive 200 % overheads in storage space, write propagation, and energy costs. Erasure codes offer an alternative approach for fault tolerance by allowing reconstruction of erased data chunks, while reducing storage overhead down to 30 %. However, a considerable share of CPU cycles and energy is spent computing such codes, effectively reducing the cluster’s efficiency and starving other user and system tasks. Offloading on a custom accelerator is a non-trivial issue, due to the highly multi-threaded nature of such tasks and the lack of robust multi-threading support in conventional accelerator runtimes.

In this work, we present a heterogeneous hardware/software architectural design for large-scale and multi-threaded acceleration of distributed erasure codes on PCIe accelerators, and a new abstraction and integration model for distributed accelerators in fault-tolerant storage systems. We enable safe and seamless deployment of multi-threaded SYCL-based IP cores through a hardware thread proxying layer providing software thread-isolation, and integration with cluster-level middlewares. In addition, our design allows for heterogeneous cluster configurations, with full compatibility and transparent integration of heterogeneously-accelerated and CPU-only nodes. We systematically evaluate the individual layers of our architecture and validate design’s integration in a container-based HDFS cluster, comparing performance against the state-of-the-art AVX-512-accelerated ISA-L library and other SYCL substrates, such as GPUs and single-threaded FPGAs.

查看原文本刊更多论文

分布式文件系统中擦除代码多线程卸载的硬件/软件架构

大数据分析和云计算在计算需求、延迟和存储方面对数据中心提供商提出了不断增长的需求。分布式文件系统提供了向外扩展计算和存储资源的战略优势，因此可以显著提高大规模并行和分布式计算范例的速度。另一方面，这种分布式集群经常面临存储故障的挑战。通常部署数据复制是为了确保容错和业务连续性，通常是在3倍配置中。这将导致存储空间、写入传播和能源成本增加200%的开销。Erasure代码提供了另一种容错方法，允许重建被擦除的数据块，同时将存储开销降低到30%。然而，相当大一部分CPU周期和能量都花在了计算这些代码上，这有效地降低了集群的效率，并使其他用户和系统任务捉襟充坐。在自定义加速器上卸载是一个非常重要的问题，因为这类任务具有高度多线程的特性，而且在传统的加速器运行时中缺乏健壮的多线程支持。在这项工作中，我们提出了一种异构硬件/软件架构设计，用于在PCIe加速器上大规模和多线程加速分布式擦除码，并提出了一种新的容错存储系统中分布式加速器的抽象和集成模型。我们通过提供软件线程隔离和与集群级中间件集成的硬件线程代理层，实现基于sycl的多线程IP核的安全无缝部署。此外，我们的设计允许异构集群配置，具有异构加速和仅cpu节点的完全兼容性和透明集成。我们系统地评估了架构的各个层，并验证了设计在基于容器的HDFS集群中的集成，将性能与最先进的avx -512加速ISA-L库和其他SYCL基板（如gpu和单线程fpga）进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Future Generation Computer Systems-The International Journal of Escience 工程技术-计算机：理论方法

CiteScore

19.90

自引率

2.70%

发文量

376

审稿时长

10.6 months

期刊介绍： Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.