{"title":"Accelerating memory and I/O intensive HPC applications using hardware compression","authors":"Saleh AlSaleh , Muhammad E.S. Elrabaa , Aiman El-Maleh , Khaled Daud , Ayman Hroub , Muhamed Mudawar , Thierry Tonellot","doi":"10.1016/j.jpdc.2024.104955","DOIUrl":null,"url":null,"abstract":"<div><p>Recently, accelerator-based compression/decompression was proposed to hide the storage latency of high-performance computing (HPC) applications that generate/ingest large data that cannot fit a node's memory. In this work, such a scheme has been implemented using a novel FPGA-based lossy compression/decompression scheme that has very low-latency. The proposed scheme completely overlaps the movement of the application's data with its compute kernels on the CPU with minimal impact on these kernels. Experiments showed that it can yield performance levels on-par with utilizing memory-only storage buffers, even though data is actually stored on disk. Experiments also showed that compared to CPU- and GPU-based compression frameworks, it achieves better performance levels at a fraction of the power consumption.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Parallel and Distributed Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0743731524001199","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, accelerator-based compression/decompression was proposed to hide the storage latency of high-performance computing (HPC) applications that generate/ingest large data that cannot fit a node's memory. In this work, such a scheme has been implemented using a novel FPGA-based lossy compression/decompression scheme that has very low-latency. The proposed scheme completely overlaps the movement of the application's data with its compute kernels on the CPU with minimal impact on these kernels. Experiments showed that it can yield performance levels on-par with utilizing memory-only storage buffers, even though data is actually stored on disk. Experiments also showed that compared to CPU- and GPU-based compression frameworks, it achieves better performance levels at a fraction of the power consumption.
最近,有人提出了基于加速器的压缩/解压缩方案,以隐藏高性能计算(HPC)应用的存储延迟,这些应用会生成/测试无法容纳节点内存的大型数据。在这项工作中,这种方案采用了一种新颖的基于 FPGA 的有损压缩/解压缩方案,具有非常低的延迟。建议的方案将应用数据的移动与 CPU 上的计算内核完全重叠,对这些内核的影响最小。实验表明,尽管数据实际上存储在磁盘上,但该方案的性能水平与仅使用内存存储缓冲区的方案相当。实验还表明,与基于 CPU 和 GPU 的压缩框架相比,它能以极低的功耗实现更高的性能水平。
期刊介绍:
This international journal is directed to researchers, engineers, educators, managers, programmers, and users of computers who have particular interests in parallel processing and/or distributed computing.
The Journal of Parallel and Distributed Computing publishes original research papers and timely review articles on the theory, design, evaluation, and use of parallel and/or distributed computing systems. The journal also features special issues on these topics; again covering the full range from the design to the use of our targeted systems.