Load-Store Queue Sizing for Efficient Dataflow Circuits

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI:10.1109/ICFPT56656.2022.9974425

Jiantao Liu, Carmine Rizzi, Lana Josipović

{"title":"Load-Store Queue Sizing for Efficient Dataflow Circuits","authors":"Jiantao Liu, Carmine Rizzi, Lana Josipović","doi":"10.1109/ICFPT56656.2022.9974425","DOIUrl":null,"url":null,"abstract":"Dataflow circuits implement dynamic scheduling and have recently been explored as an alternative to standard, statically scheduled high-level synthesis (HLS) solutions. In contrast to static HLS, dataflow circuits resolve memory dependencies during runtime by employing load-store queues (LSQs) at the memory interface. However, LSQs are extremely resource-expensive to implement in a spatial system and may cause notable frequency degradation. Therefore, there is a clear need to minimize their size and complexity, while still allowing the circuit to achieve a high computational rate. So far, designers resorted to manually tuning the LSQ depth (i.e., number of queue entries) to trade off area and performance; yet, this approach is evidently time-consuming and unfeasible for complex designs. In this work, we develop a strategy to automatically determine the most affordable LSQ depths in dataflow circuits while maintaining the best possible circuit throughput. We demonstrate our technique on benchmarks obtained from C code with different memory access patterns and show that it can effectively produce the desired Pareto-optimal design points.","PeriodicalId":239314,"journal":{"name":"2022 International Conference on Field-Programmable Technology (ICFPT)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT56656.2022.9974425","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Dataflow circuits implement dynamic scheduling and have recently been explored as an alternative to standard, statically scheduled high-level synthesis (HLS) solutions. In contrast to static HLS, dataflow circuits resolve memory dependencies during runtime by employing load-store queues (LSQs) at the memory interface. However, LSQs are extremely resource-expensive to implement in a spatial system and may cause notable frequency degradation. Therefore, there is a clear need to minimize their size and complexity, while still allowing the circuit to achieve a high computational rate. So far, designers resorted to manually tuning the LSQ depth (i.e., number of queue entries) to trade off area and performance; yet, this approach is evidently time-consuming and unfeasible for complex designs. In this work, we develop a strategy to automatically determine the most affordable LSQ depths in dataflow circuits while maintaining the best possible circuit throughput. We demonstrate our technique on benchmarks obtained from C code with different memory access patterns and show that it can effectively produce the desired Pareto-optimal design points.

查看原文本刊更多论文

高效数据流电路的负载存储队列大小

数据流电路实现动态调度，并且最近被探索作为标准的、静态调度的高级综合(HLS)解决方案的替代方案。与静态HLS相比，数据流电路通过在内存接口上使用负载存储队列(load-store queues, LSQs)来解决运行时期间的内存依赖关系。然而，在空间系统中实现lql非常耗费资源，并且可能导致显著的频率退化。因此，显然需要最小化它们的尺寸和复杂性，同时仍然允许电路实现高计算率。到目前为止，设计人员只能手动调整LSQ深度(即队列条目的数量)来权衡面积和性能;然而，对于复杂的设计，这种方法显然是耗时且不可行的。在这项工作中，我们开发了一种策略来自动确定数据流电路中最实惠的LSQ深度，同时保持最佳的电路吞吐量。我们在从具有不同内存访问模式的C代码中获得的基准测试中演示了我们的技术，并表明它可以有效地产生所需的pareto最优设计点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 International Conference on Field-Programmable Technology (ICFPT)

自引率

0.00%

发文量