基于转换的低功耗CGRAs动态并行性

2014 24th International Conference on Field Programmable Logic and Applications (FPL) Pub Date : 2014-10-20 DOI:10.1109/FPL.2014.6927485

Syed M. A. H. Jafri, G. Serrano, M. Daneshtalab, Naeem Abbas, A. Hemani, K. Paul, J. Plosila, H. Tenhunen

{"title":"基于转换的低功耗CGRAs动态并行性","authors":"Syed M. A. H. Jafri, G. Serrano, M. Daneshtalab, Naeem Abbas, A. Hemani, K. Paul, J. Plosila, H. Tenhunen","doi":"10.1109/FPL.2014.6927485","DOIUrl":null,"url":null,"abstract":"Coarse Grained Reconfigurable Architectures (CGRAs) are emerging as enabling platforms to meet the high performance demanded by modern applications (e.g. 4G, CDMA, etc.). Recently proposed CGRAs offer runtime parallelism to reduce energy consumption (by lowering voltage/frequency). To implement the runtime parallelism, CGRAs commonly store multiple compile-time generated implementations of an application (with different degree of parallelism) and select the optimal version at runtime. However, the compile-time binding incurs excessive configuration memory overheads and/or is unable to parallelize an application even when sufficient resources are available. As a solution to this problem, we propose Transformation based dynamic Parallelism (TransPar). TransPar stores only a single implementation and applies a series for transformations to generate the bitstream for the parallel version. In addition, it also allows to displace and/or rotate an application to parallelize in resource constrained scenarios. By storing only a single implementation, TransPar offers significant reductions in configuration memory requirements (up to 73% for the tested applications), compared to state of the art compaction techniques. Simulation and synthesis results, using real applications, reveal that the additional flexibility allows up to 33% energy reduction compared to static memory based parallelism techniques. Gate level analysis reveals that TransPar incurs negligible silicon (0.2% of the platform) and timing (6 additional cycles per application) penalty.","PeriodicalId":172795,"journal":{"name":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"TransPar: Transformation based dynamic Parallelism for low power CGRAs\",\"authors\":\"Syed M. A. H. Jafri, G. Serrano, M. Daneshtalab, Naeem Abbas, A. Hemani, K. Paul, J. Plosila, H. Tenhunen\",\"doi\":\"10.1109/FPL.2014.6927485\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Coarse Grained Reconfigurable Architectures (CGRAs) are emerging as enabling platforms to meet the high performance demanded by modern applications (e.g. 4G, CDMA, etc.). Recently proposed CGRAs offer runtime parallelism to reduce energy consumption (by lowering voltage/frequency). To implement the runtime parallelism, CGRAs commonly store multiple compile-time generated implementations of an application (with different degree of parallelism) and select the optimal version at runtime. However, the compile-time binding incurs excessive configuration memory overheads and/or is unable to parallelize an application even when sufficient resources are available. As a solution to this problem, we propose Transformation based dynamic Parallelism (TransPar). TransPar stores only a single implementation and applies a series for transformations to generate the bitstream for the parallel version. In addition, it also allows to displace and/or rotate an application to parallelize in resource constrained scenarios. By storing only a single implementation, TransPar offers significant reductions in configuration memory requirements (up to 73% for the tested applications), compared to state of the art compaction techniques. Simulation and synthesis results, using real applications, reveal that the additional flexibility allows up to 33% energy reduction compared to static memory based parallelism techniques. Gate level analysis reveals that TransPar incurs negligible silicon (0.2% of the platform) and timing (6 additional cycles per application) penalty.\",\"PeriodicalId\":172795,\"journal\":{\"name\":\"2014 24th International Conference on Field Programmable Logic and Applications (FPL)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 24th International Conference on Field Programmable Logic and Applications (FPL)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FPL.2014.6927485\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 24th International Conference on Field Programmable Logic and Applications (FPL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPL.2014.6927485","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

粗粒度可重构架构(CGRAs)正在成为满足现代应用(如4G、CDMA等)对高性能要求的支持平台。最近提出的CGRAs提供运行时并行性以降低能耗(通过降低电压/频率)。为了实现运行时并行性，CGRAs通常存储一个应用程序的多个编译时生成的实现(具有不同程度的并行性)，并在运行时选择最佳版本。但是，编译时绑定会导致过多的配置内存开销和/或即使在有足够的可用资源时也无法并行化应用程序。为了解决这个问题，我们提出了基于转换的动态并行(TransPar)。TransPar只存储一个实现，并应用一系列转换来生成并行版本的位流。此外，它还允许在资源受限的场景中置换和/或旋转应用程序以实现并行化。通过只存储一个实现，与目前最先进的压缩技术相比，TransPar显著降低了配置内存需求(对于测试的应用程序可降低73%)。使用实际应用的仿真和综合结果表明，与基于静态内存的并行技术相比，额外的灵活性可以减少高达33%的能量。门级分析显示，TransPar带来的硅(平台的0.2%)和时间(每个应用程序额外6个周期)损失可以忽略不计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

TransPar: Transformation based dynamic Parallelism for low power CGRAs

Coarse Grained Reconfigurable Architectures (CGRAs) are emerging as enabling platforms to meet the high performance demanded by modern applications (e.g. 4G, CDMA, etc.). Recently proposed CGRAs offer runtime parallelism to reduce energy consumption (by lowering voltage/frequency). To implement the runtime parallelism, CGRAs commonly store multiple compile-time generated implementations of an application (with different degree of parallelism) and select the optimal version at runtime. However, the compile-time binding incurs excessive configuration memory overheads and/or is unable to parallelize an application even when sufficient resources are available. As a solution to this problem, we propose Transformation based dynamic Parallelism (TransPar). TransPar stores only a single implementation and applies a series for transformations to generate the bitstream for the parallel version. In addition, it also allows to displace and/or rotate an application to parallelize in resource constrained scenarios. By storing only a single implementation, TransPar offers significant reductions in configuration memory requirements (up to 73% for the tested applications), compared to state of the art compaction techniques. Simulation and synthesis results, using real applications, reveal that the additional flexibility allows up to 33% energy reduction compared to static memory based parallelism techniques. Gate level analysis reveals that TransPar incurs negligible silicon (0.2% of the platform) and timing (6 additional cycles per application) penalty.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 24th International Conference on Field Programmable Logic and Applications (FPL)

自引率

0.00%

发文量