{"title":"Using buffer-to-BRAM mapping approaches to trade-off throughput vs. memory use","authors":"Jasmina Vasiljevic, P. Chow","doi":"10.1109/FPL.2014.6927469","DOIUrl":null,"url":null,"abstract":"One of the challenges in designing high-performance FPGA applications is fine-tuning the use of limited on-chip memory storage among many buffers in an application. To achieve desired performance and meet the on-chip memory budget requirements, the designer faces the burden of manually assigning application buffers to physical on-chip memories. Mismatches between dimensions (bit-width and depth) of buffers and physical on-chip memories lead to underutilized memories. Memory utilization can be increased via buffer packing - grouping buffers together and implementing them as a single memory, at the expense of data throughput. However, identifying buffer groups that result in the least amount of physical memory is a combinatorial problem with a large search space. This process is time consuming and non-trivial, particularly with a large number of buffers of various depths and bit widths. Previous work [1] introduced a tool that provides high-level pragmas allowing the user to specify global memory requirements, such as an application's on-chip memory budget and data throughput. This paper extends the previous work by introducing two low-level pragmas that specify information about memory access patterns, resulting in an improved on-chip memory utilization up to 22%. Further, we develop a simulated annealing based buffer packing algorithm, which reduces the tool's run-time from over 30 mins down to 15 sec, with an improvement in performance in the generated memory solution. Finally, we demonstrate the effectiveness of our tool with four stream application benchmarks.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPL.2014.6927469","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
One of the challenges in designing high-performance FPGA applications is fine-tuning the use of limited on-chip memory storage among many buffers in an application. To achieve desired performance and meet the on-chip memory budget requirements, the designer faces the burden of manually assigning application buffers to physical on-chip memories. Mismatches between dimensions (bit-width and depth) of buffers and physical on-chip memories lead to underutilized memories. Memory utilization can be increased via buffer packing - grouping buffers together and implementing them as a single memory, at the expense of data throughput. However, identifying buffer groups that result in the least amount of physical memory is a combinatorial problem with a large search space. This process is time consuming and non-trivial, particularly with a large number of buffers of various depths and bit widths. Previous work [1] introduced a tool that provides high-level pragmas allowing the user to specify global memory requirements, such as an application's on-chip memory budget and data throughput. This paper extends the previous work by introducing two low-level pragmas that specify information about memory access patterns, resulting in an improved on-chip memory utilization up to 22%. Further, we develop a simulated annealing based buffer packing algorithm, which reduces the tool's run-time from over 30 mins down to 15 sec, with an improvement in performance in the generated memory solution. Finally, we demonstrate the effectiveness of our tool with four stream application benchmarks.