CGRA express:利用动态运算融合加速执行

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems Pub Date : 2009-10-11 DOI:10.1145/1629395.1629433

Yongjun Park, Hyunchul Park, S. Mahlke

{"title":"CGRA express:利用动态运算融合加速执行","authors":"Yongjun Park, Hyunchul Park, S. Mahlke","doi":"10.1145/1629395.1629433","DOIUrl":null,"url":null,"abstract":"Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing programmability with the potential for high computation throughput, scalability, low cost, and energy efficiency. CGRAs have been effectively used for innermost loops that contain an abundant of instruction-level parallelism. Conversely, non-loop and outer-loop code are latency constrained and do not offer significant amounts of instruction-level parallelism. In these situations, CGRAs are ineffective as the majority of the resources remain idle. In this paper, dynamic operation fusion is introduced to enable CGRAs to effectively accelerate latency-constrained code regions. Dynamic operation fusion is enabled through the combination of a small bypass network added between function units in a conventional CGRA and a sub-cycle modulo scheduler to automatically identify opportunities for fusion. Results show that dynamic operation fusion reduced total application run-time by up to 17% on a 4x4 CGRA.","PeriodicalId":136293,"journal":{"name":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","volume":"120 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"65","resultStr":"{\"title\":\"CGRA express: accelerating execution using dynamic operation fusion\",\"authors\":\"Yongjun Park, Hyunchul Park, S. Mahlke\",\"doi\":\"10.1145/1629395.1629433\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing programmability with the potential for high computation throughput, scalability, low cost, and energy efficiency. CGRAs have been effectively used for innermost loops that contain an abundant of instruction-level parallelism. Conversely, non-loop and outer-loop code are latency constrained and do not offer significant amounts of instruction-level parallelism. In these situations, CGRAs are ineffective as the majority of the resources remain idle. In this paper, dynamic operation fusion is introduced to enable CGRAs to effectively accelerate latency-constrained code regions. Dynamic operation fusion is enabled through the combination of a small bypass network added between function units in a conventional CGRA and a sub-cycle modulo scheduler to automatically identify opportunities for fusion. Results show that dynamic operation fusion reduced total application run-time by up to 17% on a 4x4 CGRA.\",\"PeriodicalId\":136293,\"journal\":{\"name\":\"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems\",\"volume\":\"120 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"65\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1629395.1629433\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Compilers, Architecture, and Synthesis for Embedded Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1629395.1629433","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 65

摘要

粗粒度可重构体系结构(CGRAs)通过提供具有高计算吞吐量、可伸缩性、低成本和能源效率潜力的可编程性，提供了一个吸引人的硬件平台。CGRAs已被有效地用于包含大量指令级并行性的最内层循环。相反，非循环和外循环代码受到延迟限制，不能提供大量的指令级并行性。在这些情况下，由于大部分资源处于闲置状态，CGRAs是无效的。本文引入动态运算融合，使CGRAs能够有效地加速延迟约束码域。动态操作融合是通过在传统CGRA的功能单元之间添加一个小型旁路网络和一个子周期模调度器的组合来实现的，以自动识别融合的机会。结果表明，在4x4 CGRA上，动态操作融合将总应用程序运行时间减少了17%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CGRA express: accelerating execution using dynamic operation fusion

Coarse-grained reconfigurable architectures (CGRAs) present an appealing hardware platform by providing programmability with the potential for high computation throughput, scalability, low cost, and energy efficiency. CGRAs have been effectively used for innermost loops that contain an abundant of instruction-level parallelism. Conversely, non-loop and outer-loop code are latency constrained and do not offer significant amounts of instruction-level parallelism. In these situations, CGRAs are ineffective as the majority of the resources remain idle. In this paper, dynamic operation fusion is introduced to enable CGRAs to effectively accelerate latency-constrained code regions. Dynamic operation fusion is enabled through the combination of a small bypass network added between function units in a conventional CGRA and a sub-cycle modulo scheduler to automatically identify opportunities for fusion. Results show that dynamic operation fusion reduced total application run-time by up to 17% on a 4x4 CGRA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Conference on Compilers, Architecture, and Synthesis for Embedded Systems

自引率

0.00%

发文量