A Hybrid Memory/Accelerator Tile Architecture for FPGA-based RISC-V Manycore Systems

Ahmed Kamaleldin, Diana Göhringer
{"title":"A Hybrid Memory/Accelerator Tile Architecture for FPGA-based RISC-V Manycore Systems","authors":"Ahmed Kamaleldin, Diana Göhringer","doi":"10.1109/FPL57034.2022.00053","DOIUrl":null,"url":null,"abstract":"Multi/manycore Systems-on-Chip are increasingly adopted for heterogeneous systems, providing a high degree of computing scalability and energy efficiency. However, the steady increase in heterogeneous tiles number leads to an expansion in resource usage and design cost. Therefore, reusability and modularity of the tile architecture to support different types of compute or memory units are key elements to reduce resource usage. Meanwhile, with the proliferation of RISC-V instruction set architecture, the modularity and reusability of compute tiles have been increased. In this work, we present a modular and reusable memory/accelerator tile architecture that supports two modes of operations as a memory or an accelerator tile. The proposed tile architecture is suitable to be integrated into a NoC based manycore architecture along with RISC-V based compute tiles. The hybrid tile features a shared non-coherent scratchpad memory that can be accessed directly by RISC-V compute tiles through NoC or by the local hardware accelerator logic inside the tile. Tile mode configuration and data transfer over the NoC are managed through control messages issued by RISC-V compute tiles based on running application requirements. Moreover, the proposed tile supports the flexibility to change the local hardware accelerator functionality at run-time using dynamic and partial reconfiguration. For evaluation, two manycore configurations are developed including 4 and 8 RISC-V compute tiles with 4 cores per tile. Several use cases based on signal processing kernels and hardware accelerators are used for performance evaluation in terms of memory transfer latency and computing time for two manycore configurations. Maximum data transfer throughput of 500 MB/s is achieved between the proposed hybrid tile and a single RISC-V compute tile. The proposed tile architecture is implemented and evaluated on a Xilinx Virtex Ultrascale+ FPGA.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPL57034.2022.00053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Multi/manycore Systems-on-Chip are increasingly adopted for heterogeneous systems, providing a high degree of computing scalability and energy efficiency. However, the steady increase in heterogeneous tiles number leads to an expansion in resource usage and design cost. Therefore, reusability and modularity of the tile architecture to support different types of compute or memory units are key elements to reduce resource usage. Meanwhile, with the proliferation of RISC-V instruction set architecture, the modularity and reusability of compute tiles have been increased. In this work, we present a modular and reusable memory/accelerator tile architecture that supports two modes of operations as a memory or an accelerator tile. The proposed tile architecture is suitable to be integrated into a NoC based manycore architecture along with RISC-V based compute tiles. The hybrid tile features a shared non-coherent scratchpad memory that can be accessed directly by RISC-V compute tiles through NoC or by the local hardware accelerator logic inside the tile. Tile mode configuration and data transfer over the NoC are managed through control messages issued by RISC-V compute tiles based on running application requirements. Moreover, the proposed tile supports the flexibility to change the local hardware accelerator functionality at run-time using dynamic and partial reconfiguration. For evaluation, two manycore configurations are developed including 4 and 8 RISC-V compute tiles with 4 cores per tile. Several use cases based on signal processing kernels and hardware accelerators are used for performance evaluation in terms of memory transfer latency and computing time for two manycore configurations. Maximum data transfer throughput of 500 MB/s is achieved between the proposed hybrid tile and a single RISC-V compute tile. The proposed tile architecture is implemented and evaluated on a Xilinx Virtex Ultrascale+ FPGA.
基于fpga的RISC-V多核系统的混合内存/加速器架构
多/多核片上系统越来越多地应用于异构系统,提供了高度的计算可扩展性和能源效率。然而,异质瓷砖数量的稳步增加导致了资源使用和设计成本的增加。因此,支持不同类型的计算或内存单元的tile体系结构的可重用性和模块化是减少资源使用的关键因素。同时,随着RISC-V指令集体系结构的普及,计算块的模块化和可重用性得到了提高。在这项工作中,我们提出了一个模块化和可重用的内存/加速块架构,它支持作为内存或加速块的两种操作模式。所提出的块架构适合与基于RISC-V的计算块集成到基于NoC的多核架构中。混合磁片的特点是共享的非相干刮擦存储器,可以由RISC-V计算磁片通过NoC直接访问,也可以由磁片内部的本地硬件加速器逻辑访问。贴片模式配置和NoC上的数据传输是通过RISC-V计算贴片根据运行的应用程序需求发出的控制消息进行管理的。此外,建议的tile支持在运行时使用动态和局部重新配置灵活地更改本地硬件加速器功能。为了进行评估,开发了两种多核配置,包括4和8个RISC-V计算块,每个块有4个内核。基于信号处理内核和硬件加速器的几个用例用于两种多核配置的内存传输延迟和计算时间方面的性能评估。在混合芯片和单个RISC-V计算芯片之间实现了500mb /s的最大数据传输吞吐量。该架构在Xilinx Virtex Ultrascale+ FPGA上进行了实现和评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信