Type-directed scheduling of streaming accelerators

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2020-06-11 DOI:10.1145/3385412.3385983

David Durst, Matthew Feldman, Dillon Huff, David Akeley, Ross G. Daly, G. Bernstein, Marco Patrignani, K. Fatahalian, P. Hanrahan

{"title":"Type-directed scheduling of streaming accelerators","authors":"David Durst, Matthew Feldman, Dillon Huff, David Akeley, Ross G. Daly, G. Bernstein, Marco Patrignani, K. Fatahalian, P. Hanrahan","doi":"10.1145/3385412.3385983","DOIUrl":null,"url":null,"abstract":"Designing efficient, application-specialized hardware accelerators requires assessing trade-offs between a hardware module’s performance and resource requirements. To facilitate hardware design space exploration, we describe Aetherling, a system for automatically compiling data-parallel programs into statically scheduled, streaming hardware circuits. Aetherling contributes a space- and time-aware intermediate language featuring data-parallel operators that represent parallel or sequential hardware modules, and sequence data types that encode a module’s throughput by specifying when sequence elements are produced or consumed. As a result, well-typed operator composition in the space-time language corresponds to connecting hardware modules via statically scheduled, streaming interfaces. We provide rules for transforming programs written in a standard data-parallel language (that carries no information about hardware implementation) into equivalent space-time language programs. We then provide a scheduling algorithm that searches over the space of transformations to quickly generate area-efficient hardware designs that achieve a programmer-specified throughput. Using benchmarks from the image processing domain, we demonstrate that Aetherling enables rapid exploration of hardware designs with different throughput and area characteristics, and yields results that require 1.8-7.9× fewer FPGA slices than those of prior hardware generation systems.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"136 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3385412.3385983","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 35

Abstract

Designing efficient, application-specialized hardware accelerators requires assessing trade-offs between a hardware module’s performance and resource requirements. To facilitate hardware design space exploration, we describe Aetherling, a system for automatically compiling data-parallel programs into statically scheduled, streaming hardware circuits. Aetherling contributes a space- and time-aware intermediate language featuring data-parallel operators that represent parallel or sequential hardware modules, and sequence data types that encode a module’s throughput by specifying when sequence elements are produced or consumed. As a result, well-typed operator composition in the space-time language corresponds to connecting hardware modules via statically scheduled, streaming interfaces. We provide rules for transforming programs written in a standard data-parallel language (that carries no information about hardware implementation) into equivalent space-time language programs. We then provide a scheduling algorithm that searches over the space of transformations to quickly generate area-efficient hardware designs that achieve a programmer-specified throughput. Using benchmarks from the image processing domain, we demonstrate that Aetherling enables rapid exploration of hardware designs with different throughput and area characteristics, and yields results that require 1.8-7.9× fewer FPGA slices than those of prior hardware generation systems.

查看原文本刊更多论文

流式加速器的类型导向调度

设计高效的专用于应用程序的硬件加速器需要评估硬件模块的性能和资源需求之间的权衡。为了方便硬件设计空间的探索，我们描述了Aetherling，一个自动将数据并行程序编译成静态调度的流硬件电路的系统。Aetherling提供了一种具有空间和时间感知的中间语言，其特点是数据并行运算符(表示并行或顺序硬件模块)和序列数据类型(通过指定何时产生或使用序列元素来编码模块的吞吐量)。因此，在时空语言中，类型良好的操作符组合对应于通过静态调度的流接口连接硬件模块。我们提供了将用标准数据并行语言(不携带有关硬件实现的信息)编写的程序转换为等效时空语言程序的规则。然后，我们提供了一种调度算法，该算法搜索转换空间，以快速生成区域高效的硬件设计，从而实现程序员指定的吞吐量。使用来自图像处理领域的基准测试，我们证明了Aetherling能够快速探索具有不同吞吐量和面积特性的硬件设计，并且产生的结果比先前的硬件生成系统需要1.8-7.9倍的FPGA切片。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

自引率

0.00%

发文量