SARIS：利用间接流寄存器在高能效 RISC-V 计算集群上加速模版计算

arXiv - CS - Mathematical Software Pub Date : 2024-04-08 DOI:arxiv-2404.05303

Paul Scheffler, Luca Colagrande, Luca Benini

{"title":"SARIS：利用间接流寄存器在高能效 RISC-V 计算集群上加速模版计算","authors":"Paul Scheffler, Luca Colagrande, Luca Benini","doi":"arxiv-2404.05303","DOIUrl":null,"url":null,"abstract":"Stencil codes are performance-critical in many compute-intensive\napplications, but suffer from significant address calculation and irregular\nmemory access overheads. This work presents SARIS, a general and highly\nflexible methodology for stencil acceleration using register-mapped indirect\nstreams. We demonstrate SARIS for various stencil codes on an eight-core RISC-V\ncompute cluster with indirect stream registers, achieving significant speedups\nof 2.72x, near-ideal FPU utilizations of 81%, and energy efficiency\nimprovements of 1.58x over an RV32G baseline on average. Scaling out to a\n256-core manycore system, we estimate an average FPU utilization of 64%, an\naverage speedup of 2.14x, and up to 15% higher fractions of peak compute than a\nleading GPU code generator.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"29 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indirect Stream Registers\",\"authors\":\"Paul Scheffler, Luca Colagrande, Luca Benini\",\"doi\":\"arxiv-2404.05303\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Stencil codes are performance-critical in many compute-intensive\\napplications, but suffer from significant address calculation and irregular\\nmemory access overheads. This work presents SARIS, a general and highly\\nflexible methodology for stencil acceleration using register-mapped indirect\\nstreams. We demonstrate SARIS for various stencil codes on an eight-core RISC-V\\ncompute cluster with indirect stream registers, achieving significant speedups\\nof 2.72x, near-ideal FPU utilizations of 81%, and energy efficiency\\nimprovements of 1.58x over an RV32G baseline on average. Scaling out to a\\n256-core manycore system, we estimate an average FPU utilization of 64%, an\\naverage speedup of 2.14x, and up to 15% higher fractions of peak compute than a\\nleading GPU code generator.\",\"PeriodicalId\":501256,\"journal\":{\"name\":\"arXiv - CS - Mathematical Software\",\"volume\":\"29 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Mathematical Software\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2404.05303\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Mathematical Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2404.05303","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在许多计算密集型应用中，模版代码对性能至关重要，但却存在大量地址计算和不规则内存访问开销。本研究提出了一种利用寄存器映射间接流进行模版加速的通用且高度灵活的方法--SARIS。我们在带有间接流寄存器的八核 RISC-V 计算集群上演示了各种模板代码的 SARIS，与 RV32G 基准相比，速度显著提高了 2.72 倍，FPU 利用率接近理想值的 81%，能效平均提高了 1.58 倍。扩展到 256 核多核系统，我们估计 FPU 平均利用率为 64%，平均速度提高了 2.14 倍，峰值计算分数比领先的 GPU 代码生成器高 15%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indirect Stream Registers

Stencil codes are performance-critical in many compute-intensive applications, but suffer from significant address calculation and irregular memory access overheads. This work presents SARIS, a general and highly flexible methodology for stencil acceleration using register-mapped indirect streams. We demonstrate SARIS for various stencil codes on an eight-core RISC-V compute cluster with indirect stream registers, achieving significant speedups of 2.72x, near-ideal FPU utilizations of 81%, and energy efficiency improvements of 1.58x over an RV32G baseline on average. Scaling out to a 256-core manycore system, we estimate an average FPU utilization of 64%, an average speedup of 2.14x, and up to 15% higher fractions of peak compute than a leading GPU code generator.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Mathematical Software

自引率

0.00%

发文量