Automatic multidimensional memory partitioning for FPGA-based accelerators (abstract only)

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI:10.1145/2435264.2435321

Yuxin Wang, Peng Li, Peng Zhang, Chen Zhang, J. Cong

{"title":"Automatic multidimensional memory partitioning for FPGA-based accelerators (abstract only)","authors":"Yuxin Wang, Peng Li, Peng Zhang, Chen Zhang, J. Cong","doi":"10.1145/2435264.2435321","DOIUrl":null,"url":null,"abstract":"With the increase of data processing throughput in reconfigurable computing, data parallelism is now crucial for the performance of FPGA-based accelerators. However, most of the data parallelism optimizations are still performed manually by experienced hardware designers. Memory partitioning is widely adopted to efficiently increase the memory bandwidth by using multiple memory banks and reducing data access conflict. Previous methods for memory partitioning mainly focused on one-dimensional arrays. As a consequence, designers must flatten a multidimensional array to fit those methodologies, but it makes the partition related to the dimensional width of the array. In this work we propose an automatic memory partitioning scheme for multidimensional arrays to provide high data throughput of on-chip memories for the loop pipelining in high-level synthesis. Linear transformation is applied to optimize the layout of the data elements in the memory banks, with the partition unrelated to the dimensional width. Two transformation vectors are used to map the original data element onto different banks and different inner bank offsets. The vector for the optimal bank mapping is decided by non-conflict access constraint. In addition, a memory padding technique is proposed to find a vector for inner bank offset with a trade-off between practicality and optimality. We use six benchmarks with different access patterns to prove our idea. Compared to the previous one-dimensional partitioning work, the experimental results show that our approach saves up to 21% of block RAMs, 19% in slices, and 46% in DSPs.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"19 1","pages":"269"},"PeriodicalIF":0.0000,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2435264.2435321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

With the increase of data processing throughput in reconfigurable computing, data parallelism is now crucial for the performance of FPGA-based accelerators. However, most of the data parallelism optimizations are still performed manually by experienced hardware designers. Memory partitioning is widely adopted to efficiently increase the memory bandwidth by using multiple memory banks and reducing data access conflict. Previous methods for memory partitioning mainly focused on one-dimensional arrays. As a consequence, designers must flatten a multidimensional array to fit those methodologies, but it makes the partition related to the dimensional width of the array. In this work we propose an automatic memory partitioning scheme for multidimensional arrays to provide high data throughput of on-chip memories for the loop pipelining in high-level synthesis. Linear transformation is applied to optimize the layout of the data elements in the memory banks, with the partition unrelated to the dimensional width. Two transformation vectors are used to map the original data element onto different banks and different inner bank offsets. The vector for the optimal bank mapping is decided by non-conflict access constraint. In addition, a memory padding technique is proposed to find a vector for inner bank offset with a trade-off between practicality and optimality. We use six benchmarks with different access patterns to prove our idea. Compared to the previous one-dimensional partitioning work, the experimental results show that our approach saves up to 21% of block RAMs, 19% in slices, and 46% in DSPs.

查看原文本刊更多论文

基于fpga的加速器的自动多维内存分区(仅抽象)

随着可重构计算中数据处理吞吐量的提高，数据并行性对fpga加速器的性能至关重要。但是，大多数数据并行性优化仍然由经验丰富的硬件设计人员手动执行。内存分区被广泛采用，通过使用多个内存库来有效地增加内存带宽，减少数据访问冲突。以前的内存分区方法主要针对一维数组。因此，设计人员必须将多维数组扁平化以适应这些方法，但这会使分区与数组的维度宽度相关。在这项工作中，我们提出了一种多维阵列的自动内存分区方案，为高级合成中的循环流水线提供高数据吞吐量的片上存储器。采用线性变换优化存储库中数据元素的布局，分区与维度宽度无关。使用两个变换向量将原始数据元素映射到不同的库和不同的内库偏移量。最优银行映射向量由无冲突访问约束决定。此外，提出了一种内存填充技术，在实用性和最优性之间寻找一个内层偏移量的向量。我们使用六个具有不同访问模式的基准测试来证明我们的想法。实验结果表明，与之前的一维分区方法相比，我们的方法节省了21%的块ram, 19%的片ram和46%的dsp。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

FPGA. ACM International Symposium on Field-Programmable Gate Arrays

自引率

0.00%

发文量