Learning to Design Accurate Deep Learning Accelerators with Inaccurate Multipliers

2022 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2022-03-14 DOI:10.23919/DATE54114.2022.9774607

Paras Jain, Safeen Huda, Martin Maas, Joseph Gonzalez, I. Stoica, Azalia Mirhoseini

{"title":"Learning to Design Accurate Deep Learning Accelerators with Inaccurate Multipliers","authors":"Paras Jain, Safeen Huda, Martin Maas, Joseph Gonzalez, I. Stoica, Azalia Mirhoseini","doi":"10.23919/DATE54114.2022.9774607","DOIUrl":null,"url":null,"abstract":"Approximate computing is a promising way to improve the power efficiency of deep learning. While recent work proposes new arithmetic circuits (adders and multipliers) that consume substantially less power at the cost of computation errors, these approximate circuits decrease the end-to-end accuracy of common models. We present AutoApprox, a framework to automatically generate approximate low-power deep learning accelerators without any accuracy loss. AutoApprox generates a wide range of approximate ASIC accelerators with a TPUv3 systolic-array template. AutoApprox uses a learned router to assign each DNN layer to an approximate systolic array from a bank of arrays with varying approximation levels. By tailoring this routing for a specific neural network architecture, we discover circuit designs without the accuracy penalty from prior methods. Moreover, AutoApprox optimizes for the end-to-end performance, power and area of the the whole chip and PE mapping rather than simply measuring the performance of the arithmetic units in iso-lation. To our knowledge, our work is the first to demonstrate the effectiveness of custom-tailored approximate circuits in delivering significant chip-level energy savings with zero accuracy loss on a large-scale dataset such as ImageNet. AutoApprox synthesizes a novel approximate accelerator based on the TPU that reduces end-to-end power consumption by 3.2% and area by 5.2% at a sub-10nm process with no degradation in ImageNet validation top-1 and top-5 accuracy.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"220 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/DATE54114.2022.9774607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Approximate computing is a promising way to improve the power efficiency of deep learning. While recent work proposes new arithmetic circuits (adders and multipliers) that consume substantially less power at the cost of computation errors, these approximate circuits decrease the end-to-end accuracy of common models. We present AutoApprox, a framework to automatically generate approximate low-power deep learning accelerators without any accuracy loss. AutoApprox generates a wide range of approximate ASIC accelerators with a TPUv3 systolic-array template. AutoApprox uses a learned router to assign each DNN layer to an approximate systolic array from a bank of arrays with varying approximation levels. By tailoring this routing for a specific neural network architecture, we discover circuit designs without the accuracy penalty from prior methods. Moreover, AutoApprox optimizes for the end-to-end performance, power and area of the the whole chip and PE mapping rather than simply measuring the performance of the arithmetic units in iso-lation. To our knowledge, our work is the first to demonstrate the effectiveness of custom-tailored approximate circuits in delivering significant chip-level energy savings with zero accuracy loss on a large-scale dataset such as ImageNet. AutoApprox synthesizes a novel approximate accelerator based on the TPU that reduces end-to-end power consumption by 3.2% and area by 5.2% at a sub-10nm process with no degradation in ImageNet validation top-1 and top-5 accuracy.

查看原文本刊更多论文

近似计算是一种很有前途的提高深度学习功率效率的方法。虽然最近的工作提出了新的算术电路(加法器和乘法器)，它们以计算错误为代价消耗的功率大大减少，但这些近似电路降低了通用模型的端到端精度。AutoApprox使用TPUv3收缩阵列模板生成广泛的近似ASIC加速器。AutoApprox使用学习路由器将每个DNN层从具有不同近似级别的数组库中分配到近似收缩数组。通过为特定的神经网络架构定制这种路由，我们发现电路设计没有先前方法的精度损失。此外，AutoApprox对整个芯片的端到端性能、功耗和面积以及PE映射进行了优化，而不是简单地孤立地测量算术单元的性能。据我们所知，我们的工作首次证明了定制近似电路在大规模数据集(如ImageNet)上提供显着的芯片级节能和零精度损失方面的有效性。AutoApprox合成了一种基于TPU的新型近似加速器，在亚10nm工艺下，端到端功耗降低3.2%，面积减少5.2%，且ImageNet验证前1和前5精度没有下降。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)

自引率

0.00%

发文量