Limits of Statically-Scheduled Token Dataflow Processing

2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing Pub Date : 2014-08-24 DOI:10.1109/DFM.2014.21

Nachiket Kapre, Siddhartha

{"title":"Limits of Statically-Scheduled Token Dataflow Processing","authors":"Nachiket Kapre, Siddhartha","doi":"10.1109/DFM.2014.21","DOIUrl":null,"url":null,"abstract":"FPGA-based token dataflow processing has been shown to accelerate hard-to-parallelize problems exhibiting irregular dataflow parallelism by as much as an order of magnitude when compared to conventional compute organizations. However, when the structure of the dataflow computation is known upfront, either at compile time or at the start of execution, we can employ static scheduling techniques to further improve performance and enhance compute density of the dataflow hardware. In this paper, we identify the costs and performance trends of both static and dynamic scheduling approaches when considering hardware acceleration of SPICE device equations and Sparse LU factorization in circuit graphs. While the experiments are limited to a case study, the hardware design and dataflow compiler are general and can be extended to other problems and instances where dataflow computing may be applicable. With this study, we hope to develop a quantitative basis for the design of a hybrid dataflow architecture that combines both static and dynamic scheduling techniques. We observe a performance benefit of 2 - 4× and a resource utilization saving of 2 - 3× in favor of statically scheduled hardware.","PeriodicalId":183526,"journal":{"name":"2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DFM.2014.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

FPGA-based token dataflow processing has been shown to accelerate hard-to-parallelize problems exhibiting irregular dataflow parallelism by as much as an order of magnitude when compared to conventional compute organizations. However, when the structure of the dataflow computation is known upfront, either at compile time or at the start of execution, we can employ static scheduling techniques to further improve performance and enhance compute density of the dataflow hardware. In this paper, we identify the costs and performance trends of both static and dynamic scheduling approaches when considering hardware acceleration of SPICE device equations and Sparse LU factorization in circuit graphs. While the experiments are limited to a case study, the hardware design and dataflow compiler are general and can be extended to other problems and instances where dataflow computing may be applicable. With this study, we hope to develop a quantitative basis for the design of a hybrid dataflow architecture that combines both static and dynamic scheduling techniques. We observe a performance benefit of 2 - 4× and a resource utilization saving of 2 - 3× in favor of statically scheduled hardware.

查看原文本刊更多论文

静态调度令牌数据流处理的限制

与传统的计算组织相比，基于fpga的令牌数据流处理已被证明可以加速难以并行化的问题，这些问题表现出不规则的数据流并行性，其速度多达一个数量级。然而，当数据流计算的结构在编译时或执行开始时就已知时，我们可以使用静态调度技术来进一步提高性能并增强数据流硬件的计算密度。在本文中，我们在考虑SPICE器件方程的硬件加速和电路图中的稀疏LU分解时，确定了静态和动态调度方法的成本和性能趋势。虽然实验仅限于案例研究，但硬件设计和数据流编译器是通用的，可以扩展到数据流计算可能适用的其他问题和实例。通过这项研究，我们希望为结合静态和动态调度技术的混合数据流架构的设计提供定量基础。我们观察到静态调度硬件的性能优势为2 - 4倍，资源利用率节省为2 - 3倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing

自引率

0.00%

发文量