Elastic CGRAs

FPGA. ACM International Symposium on Field-Programmable Gate Arrays Pub Date : 2013-02-11 DOI:10.1145/2435264.2435296

Yuanjie Huang, P. Ienne, O. Temam, Yunji Chen, Chengyong Wu

{"title":"Elastic CGRAs","authors":"Yuanjie Huang, P. Ienne, O. Temam, Yunji Chen, Chengyong Wu","doi":"10.1145/2435264.2435296","DOIUrl":null,"url":null,"abstract":"Vital technology trends such as voltage scaling and homogeneous multicore scaling have reached their limits and architects turn to alternate computing paradigms, such as heterogeneous and domain-specialized solutions. Coarse-Grain Reconfigurable Arrays (CGRAs) promise the performance of massively spatial computing while offering interesting trade-offs of flexibility versus energy efficiency. Yet, configuring and scheduling execution for CGRAs generally runs into the classic difficulties that have hampered Very-Long Instruction Word (VLIW) architectures: efficient schedules are difficult to generate, especially for applications with complex control flow and data structures, and they are inherently static - thus, in adapted to variable-latency components (such as the read ports of caches). Over the years, VLIWs have been relegated to important but specific application domains where such issues are more under the control of the designers; similarly, statically-scheduled CGRAs may prove inadequate for future general-purpose computing systems. In this paper, we introduce Elastic CGRAs, the superscalar processors of computing fabrics: no complex schedule needs to be computed at configuration time, and the operations execute dynamically in the CGRA when data are ready, thus exploiting the data parallelism that an application offers. We designed, down to a manufacturable layout, a simple CGRA where we demonstrated and optimized our elastic control circuitry. We also built a complete compilation toolchain that transforms arbitrary C code in a configuration for the array. The area overhead (26.2%), critical path overhead (8.2%) and energy overhead (53.6%) of Elastic CGRAs over non-elastic CGRAs are significantly lower than the overhead of superscalar processors over VLIWs, while providing the same benefits. At such moderate costs, elasticity may prove to be one of the key enablers to make the adoption of CGRAs widespread.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"44 1","pages":"171-180"},"PeriodicalIF":0.0000,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"57","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2435264.2435296","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 57

Abstract

Vital technology trends such as voltage scaling and homogeneous multicore scaling have reached their limits and architects turn to alternate computing paradigms, such as heterogeneous and domain-specialized solutions. Coarse-Grain Reconfigurable Arrays (CGRAs) promise the performance of massively spatial computing while offering interesting trade-offs of flexibility versus energy efficiency. Yet, configuring and scheduling execution for CGRAs generally runs into the classic difficulties that have hampered Very-Long Instruction Word (VLIW) architectures: efficient schedules are difficult to generate, especially for applications with complex control flow and data structures, and they are inherently static - thus, in adapted to variable-latency components (such as the read ports of caches). Over the years, VLIWs have been relegated to important but specific application domains where such issues are more under the control of the designers; similarly, statically-scheduled CGRAs may prove inadequate for future general-purpose computing systems. In this paper, we introduce Elastic CGRAs, the superscalar processors of computing fabrics: no complex schedule needs to be computed at configuration time, and the operations execute dynamically in the CGRA when data are ready, thus exploiting the data parallelism that an application offers. We designed, down to a manufacturable layout, a simple CGRA where we demonstrated and optimized our elastic control circuitry. We also built a complete compilation toolchain that transforms arbitrary C code in a configuration for the array. The area overhead (26.2%), critical path overhead (8.2%) and energy overhead (53.6%) of Elastic CGRAs over non-elastic CGRAs are significantly lower than the overhead of superscalar processors over VLIWs, while providing the same benefits. At such moderate costs, elasticity may prove to be one of the key enablers to make the adoption of CGRAs widespread.

查看原文本刊更多论文

弹性CGRAs

重要的技术趋势，如电压缩放和同质多核缩放已经达到了极限，架构师转向替代计算范式，如异构和领域专用解决方案。粗粒度可重构阵列(CGRAs)承诺大规模空间计算的性能，同时提供了灵活性与能源效率之间的有趣权衡。然而，配置和调度CGRAs的执行通常会遇到阻碍超长指令字(VLIW)体系结构的经典困难:很难生成有效的调度，特别是对于具有复杂控制流和数据结构的应用程序，而且它们本质上是静态的——因此，在适应可变延迟组件(例如缓存的读端口)时。多年来，VLIWs已被降级到重要但特定的应用领域，在这些领域中，此类问题更多地由设计人员控制;同样，静态调度的CGRAs可能不适合未来的通用计算系统。在本文中，我们介绍了弹性CGRA，计算结构的标量处理器:不需要在配置时计算复杂的调度，当数据准备好时，操作在CGRA中动态执行，从而利用了应用程序提供的数据并行性。我们设计了一个简单的CGRA，直到一个可制造的布局，我们演示并优化了我们的弹性控制电路。我们还构建了一个完整的编译工具链，用于转换数组配置中的任意C代码。与非弹性CGRAs相比，弹性CGRAs的面积开销(26.2%)、关键路径开销(8.2%)和能量开销(53.6%)明显低于超大标量处理器在VLIWs上的开销，同时提供相同的好处。在如此适度的成本下，弹性可能被证明是CGRAs广泛采用的关键推动因素之一。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

FPGA. ACM International Symposium on Field-Programmable Gate Arrays

自引率

0.00%

发文量