Usage-Based RTL Subsetting for Hardware Accelerators

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD) Pub Date : 2022-10-29 DOI:10.1145/3508352.3549391

Qinhan Tan, Aarti Gupta, S. Malik

{"title":"Usage-Based RTL Subsetting for Hardware Accelerators","authors":"Qinhan Tan, Aarti Gupta, S. Malik","doi":"10.1145/3508352.3549391","DOIUrl":null,"url":null,"abstract":"Recent years have witnessed increasing use of domain-specific accelerators in computing platforms to provide power-performance efficiency for emerging applications. To increase their applicability within the domain, these accelerators tend to support a large set of functions, e.g. Nvidia’s open-source Deep Learning Accelerator, NVDLA, supports five distinct groups of functions [17]. However, an individual use case of an accelerator may utilize only a subset of these functions. The unused functions lead to unnecessary overhead of silicon area, power, and hardware verification/hardware-software co-verification complexity. This motivates our research question: Given an RTL design for an accelerator and a subset of functions of interest, can we automatically extract a subset of the RTL that is sufficient for these functions and sequentially equivalent to the original RTL? We call this the Usage-based RTL Subsetting problem, referred to as the RTL subsetting problem in short. We first formally define this problem and show that it can be formulated as a program synthesis problem, which can be solved by performing expensive hyperproperty checks. To overcome the high cost, we propose multiple levels of sound over-approximations to construct an effective algorithm based on relatively less expensive temporal property checking and taint analysis for information flow checking. We demonstrate the acceptable computation cost and the quality of the results of our algorithm through several case studies of accelerators from different domains. The applicability of our proposed algorithm can be seen in its ability to subset the large NVDLA accelerator (with over 50,000 registers and 1,600,000 gates) for the group of convolution functions, where the subset reduces the total number of registers by 18.6% and the total number of gates by 37.1%.","PeriodicalId":270592,"journal":{"name":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508352.3549391","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recent years have witnessed increasing use of domain-specific accelerators in computing platforms to provide power-performance efficiency for emerging applications. To increase their applicability within the domain, these accelerators tend to support a large set of functions, e.g. Nvidia’s open-source Deep Learning Accelerator, NVDLA, supports five distinct groups of functions [17]. However, an individual use case of an accelerator may utilize only a subset of these functions. The unused functions lead to unnecessary overhead of silicon area, power, and hardware verification/hardware-software co-verification complexity. This motivates our research question: Given an RTL design for an accelerator and a subset of functions of interest, can we automatically extract a subset of the RTL that is sufficient for these functions and sequentially equivalent to the original RTL? We call this the Usage-based RTL Subsetting problem, referred to as the RTL subsetting problem in short. We first formally define this problem and show that it can be formulated as a program synthesis problem, which can be solved by performing expensive hyperproperty checks. To overcome the high cost, we propose multiple levels of sound over-approximations to construct an effective algorithm based on relatively less expensive temporal property checking and taint analysis for information flow checking. We demonstrate the acceptable computation cost and the quality of the results of our algorithm through several case studies of accelerators from different domains. The applicability of our proposed algorithm can be seen in its ability to subset the large NVDLA accelerator (with over 50,000 registers and 1,600,000 gates) for the group of convolution functions, where the subset reduces the total number of registers by 18.6% and the total number of gates by 37.1%.

查看原文本刊更多论文

硬件加速器基于使用的RTL子集

近年来，在计算平台中越来越多地使用特定领域的加速器，为新兴应用程序提供功率性能效率。但是，加速器的单个用例可能只使用这些功能的一个子集。未使用的功能会导致不必要的硅面积、功率和硬件验证/硬件软件协同验证复杂性的开销。这激发了我们的研究问题:给定加速器的RTL设计和感兴趣的函数子集，我们能否自动提取RTL的一个子集，该子集足以满足这些函数，并且顺序等效于原始RTL?我们称之为基于使用的RTL子集问题，简称RTL子集问题。我们首先正式定义了这个问题，并表明它可以公式化为一个程序综合问题，该问题可以通过执行昂贵的超性质检查来解决。为了克服高成本的问题，我们提出了多级声音过近似，构建了一种基于相对便宜的时间属性检查和污染分析的有效算法来进行信息流检查。通过对不同领域加速器的几个案例研究，我们证明了我们的算法可以接受的计算成本和结果质量。我们提出的算法的适用性可以从它对卷积函数组的大型NVDLA加速器(超过50,000个寄存器和1,600,000个门)进行子集的能力中看出，其中子集将寄存器总数减少了18.6%，门总数减少了37.1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD)

自引率

0.00%

发文量