高效数据流电路的负载存储队列大小

2022 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2022-12-05 DOI:10.1109/ICFPT56656.2022.9974425

Jiantao Liu, Carmine Rizzi, Lana Josipović

{"title":"高效数据流电路的负载存储队列大小","authors":"Jiantao Liu, Carmine Rizzi, Lana Josipović","doi":"10.1109/ICFPT56656.2022.9974425","DOIUrl":null,"url":null,"abstract":"Dataflow circuits implement dynamic scheduling and have recently been explored as an alternative to standard, statically scheduled high-level synthesis (HLS) solutions. In contrast to static HLS, dataflow circuits resolve memory dependencies during runtime by employing load-store queues (LSQs) at the memory interface. However, LSQs are extremely resource-expensive to implement in a spatial system and may cause notable frequency degradation. Therefore, there is a clear need to minimize their size and complexity, while still allowing the circuit to achieve a high computational rate. So far, designers resorted to manually tuning the LSQ depth (i.e., number of queue entries) to trade off area and performance; yet, this approach is evidently time-consuming and unfeasible for complex designs. In this work, we develop a strategy to automatically determine the most affordable LSQ depths in dataflow circuits while maintaining the best possible circuit throughput. We demonstrate our technique on benchmarks obtained from C code with different memory access patterns and show that it can effectively produce the desired Pareto-optimal design points.","PeriodicalId":239314,"journal":{"name":"2022 International Conference on Field-Programmable Technology (ICFPT)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Load-Store Queue Sizing for Efficient Dataflow Circuits\",\"authors\":\"Jiantao Liu, Carmine Rizzi, Lana Josipović\",\"doi\":\"10.1109/ICFPT56656.2022.9974425\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dataflow circuits implement dynamic scheduling and have recently been explored as an alternative to standard, statically scheduled high-level synthesis (HLS) solutions. In contrast to static HLS, dataflow circuits resolve memory dependencies during runtime by employing load-store queues (LSQs) at the memory interface. However, LSQs are extremely resource-expensive to implement in a spatial system and may cause notable frequency degradation. Therefore, there is a clear need to minimize their size and complexity, while still allowing the circuit to achieve a high computational rate. So far, designers resorted to manually tuning the LSQ depth (i.e., number of queue entries) to trade off area and performance; yet, this approach is evidently time-consuming and unfeasible for complex designs. In this work, we develop a strategy to automatically determine the most affordable LSQ depths in dataflow circuits while maintaining the best possible circuit throughput. We demonstrate our technique on benchmarks obtained from C code with different memory access patterns and show that it can effectively produce the desired Pareto-optimal design points.\",\"PeriodicalId\":239314,\"journal\":{\"name\":\"2022 International Conference on Field-Programmable Technology (ICFPT)\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Field-Programmable Technology (ICFPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICFPT56656.2022.9974425\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT56656.2022.9974425","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

数据流电路实现动态调度，并且最近被探索作为标准的、静态调度的高级综合(HLS)解决方案的替代方案。与静态HLS相比，数据流电路通过在内存接口上使用负载存储队列(load-store queues, LSQs)来解决运行时期间的内存依赖关系。然而，在空间系统中实现lql非常耗费资源，并且可能导致显著的频率退化。因此，显然需要最小化它们的尺寸和复杂性，同时仍然允许电路实现高计算率。到目前为止，设计人员只能手动调整LSQ深度(即队列条目的数量)来权衡面积和性能;然而，对于复杂的设计，这种方法显然是耗时且不可行的。在这项工作中，我们开发了一种策略来自动确定数据流电路中最实惠的LSQ深度，同时保持最佳的电路吞吐量。我们在从具有不同内存访问模式的C代码中获得的基准测试中演示了我们的技术，并表明它可以有效地产生所需的pareto最优设计点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Load-Store Queue Sizing for Efficient Dataflow Circuits

Dataflow circuits implement dynamic scheduling and have recently been explored as an alternative to standard, statically scheduled high-level synthesis (HLS) solutions. In contrast to static HLS, dataflow circuits resolve memory dependencies during runtime by employing load-store queues (LSQs) at the memory interface. However, LSQs are extremely resource-expensive to implement in a spatial system and may cause notable frequency degradation. Therefore, there is a clear need to minimize their size and complexity, while still allowing the circuit to achieve a high computational rate. So far, designers resorted to manually tuning the LSQ depth (i.e., number of queue entries) to trade off area and performance; yet, this approach is evidently time-consuming and unfeasible for complex designs. In this work, we develop a strategy to automatically determine the most affordable LSQ depths in dataflow circuits while maintaining the best possible circuit throughput. We demonstrate our technique on benchmarks obtained from C code with different memory access patterns and show that it can effectively produce the desired Pareto-optimal design points.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 International Conference on Field-Programmable Technology (ICFPT)

自引率

0.00%

发文量