A Framework for Adding Low-Overhead, Fine-Grained Power Domains to CGRAs

2020 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2020-03-01 DOI:10.23919/DATE48585.2020.9116477

Ankita Nayak, Keyi Zhang, Rajsekhar Setaluri, Alex Carsello, Makai Mann, S. Richardson, Rick Bahr, P. Hanrahan, M. Horowitz, Priyanka Raina

{"title":"A Framework for Adding Low-Overhead, Fine-Grained Power Domains to CGRAs","authors":"Ankita Nayak, Keyi Zhang, Rajsekhar Setaluri, Alex Carsello, Makai Mann, S. Richardson, Rick Bahr, P. Hanrahan, M. Horowitz, Priyanka Raina","doi":"10.23919/DATE48585.2020.9116477","DOIUrl":null,"url":null,"abstract":"To effectively minimize static power for a wide range of applications, power domains for a coarse-grained reconfigurable array (CGRA) need to be finer-grained than a typical ASIC. However, the special isolation logic needed to ensure electrical protection between off and on domains makes fine-grained power domains area- and timing-inefficient. We propose a novel design of the CGRA routing fabric that intrinsically provides boundary protection. This technique reduces the area overhead of boundary protection between power domains for the CGRA from around 9% to less than 1% and removes the delay from the isolation cells. However, with this design choice, we cannot leverage the conventional UPF-based flow to introduce power domain boundary protection. We create compiler-like passes that iteratively introduce the needed design transformations, and formally verify the passes with satisfiability modulo theories (SMT) methods. These passes also allow us to optimize how we handle test and debug signals through the off tiles. We use our framework to insert power domains into an SoC with an ARM Cortex M3 processor and a CGRA with 32 × 16 processing element (PE) and memory tiles and 4MB secondary memory. Depending on the size of the applications mapped, our CGRA achieves up to an 83% reduction in leakage power and 26% reduction in total power versus a CGRA without multiple power domains, for a range of image processing and machine learning applications.","PeriodicalId":289525,"journal":{"name":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/DATE48585.2020.9116477","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

To effectively minimize static power for a wide range of applications, power domains for a coarse-grained reconfigurable array (CGRA) need to be finer-grained than a typical ASIC. However, the special isolation logic needed to ensure electrical protection between off and on domains makes fine-grained power domains area- and timing-inefficient. We propose a novel design of the CGRA routing fabric that intrinsically provides boundary protection. This technique reduces the area overhead of boundary protection between power domains for the CGRA from around 9% to less than 1% and removes the delay from the isolation cells. However, with this design choice, we cannot leverage the conventional UPF-based flow to introduce power domain boundary protection. We create compiler-like passes that iteratively introduce the needed design transformations, and formally verify the passes with satisfiability modulo theories (SMT) methods. These passes also allow us to optimize how we handle test and debug signals through the off tiles. We use our framework to insert power domains into an SoC with an ARM Cortex M3 processor and a CGRA with 32 × 16 processing element (PE) and memory tiles and 4MB secondary memory. Depending on the size of the applications mapped, our CGRA achieves up to an 83% reduction in leakage power and 26% reduction in total power versus a CGRA without multiple power domains, for a range of image processing and machine learning applications.

查看原文本刊更多论文

向CGRAs添加低开销、细粒度功率域的框架

为了有效地减少各种应用的静态功率，粗粒度可重构阵列(CGRA)的功率域需要比典型的ASIC更细粒度。然而，确保关开域之间的电气保护所需的特殊隔离逻辑使得细粒度功率域的面积和时间效率低下。我们提出了一种新颖的CGRA路由结构设计，它本质上提供了边界保护。该技术将CGRA功率域之间的边界保护面积开销从约9%降低到小于1%，并消除了隔离单元的延迟。然而，在这种设计选择下，我们无法利用传统的基于upf的流程来引入功率域边界保护。我们创建类似编译器的传递，迭代地引入所需的设计转换，并使用可满足模理论(SMT)方法正式验证传递。这些传递还允许我们优化如何通过关闭块处理测试和调试信号。我们使用我们的框架将电源域插入到具有ARM Cortex M3处理器和具有32 × 16处理元件(PE)、内存块和4MB辅助存储器的CGRA的SoC中。根据所映射应用的大小，与没有多个功率域的CGRA相比，我们的CGRA在一系列图像处理和机器学习应用中实现了高达83%的泄漏功率降低和26%的总功率降低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)

自引率

0.00%

发文量