Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks

Proceedings of the 2016 International Conference on Supercomputing Pub Date : 2016-06-01 DOI:10.1145/2925426.2926294

Patrick Judd, Jorge Albericio, Tayler H. Hetherington, Tor M. Aamodt, Natalie D. Enright Jerger, Andreas Moshovos

{"title":"Proteus: Exploiting Numerical Precision Variability in Deep Neural Networks","authors":"Patrick Judd, Jorge Albericio, Tayler H. Hetherington, Tor M. Aamodt, Natalie D. Enright Jerger, Andreas Moshovos","doi":"10.1145/2925426.2926294","DOIUrl":null,"url":null,"abstract":"This work exploits the tolerance of Deep Neural Networks (DNNs) to reduced precision numerical representations and specifically, their recently demonstrated ability to tolerate representations of different precision per layer while maintaining accuracy. This flexibility enables improvements over conventional DNN implementations that use a single, uniform representation. This work proposes Proteus, which reduces the data traffic and storage footprint needed by DNNs, resulting in reduced energy and improved area efficiency for DNN implementations. Proteus uses a different representation per layer for both the data (neurons) and the weights (synapses) processed by DNNs. Proteus is a layered extension over existing DNN implementations that converts between the numerical representation used by the DNN execution engines and the shorter, layer-specific fixed-point representation used when reading and writing data values to memory be it on-chip buffers or off-chip memory. Proteus uses a novel memory layout for DNN data, enabling a simple, low-cost and low-energy conversion unit. We evaluate Proteus as an extension to a state-of-the-art accelerator [7] which uses a uniform 16-bit fixed-point representation. On five popular DNNs Proteus reduces data traffic among layers by 43% on average while maintaining accuracy within 1% even when compared to a single precision floating-point implementation. As a result, Proteus improves energy by 15% with no performance loss. Proteus also reduces the data footprint by at least 38% and hence the amount of on-chip buffering needed resulting in an implementation that requires 20% less area overall. This area savings can be used to improve cost by building smaller chips, to process larger DNNs for the same on-chip area, or to incorporate an additional three execution engines increasing peak performance bandwidth by 18%.","PeriodicalId":422112,"journal":{"name":"Proceedings of the 2016 International Conference on Supercomputing","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"95","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2925426.2926294","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 95

Abstract

This work exploits the tolerance of Deep Neural Networks (DNNs) to reduced precision numerical representations and specifically, their recently demonstrated ability to tolerate representations of different precision per layer while maintaining accuracy. This flexibility enables improvements over conventional DNN implementations that use a single, uniform representation. This work proposes Proteus, which reduces the data traffic and storage footprint needed by DNNs, resulting in reduced energy and improved area efficiency for DNN implementations. Proteus uses a different representation per layer for both the data (neurons) and the weights (synapses) processed by DNNs. Proteus is a layered extension over existing DNN implementations that converts between the numerical representation used by the DNN execution engines and the shorter, layer-specific fixed-point representation used when reading and writing data values to memory be it on-chip buffers or off-chip memory. Proteus uses a novel memory layout for DNN data, enabling a simple, low-cost and low-energy conversion unit. We evaluate Proteus as an extension to a state-of-the-art accelerator [7] which uses a uniform 16-bit fixed-point representation. On five popular DNNs Proteus reduces data traffic among layers by 43% on average while maintaining accuracy within 1% even when compared to a single precision floating-point implementation. As a result, Proteus improves energy by 15% with no performance loss. Proteus also reduces the data footprint by at least 38% and hence the amount of on-chip buffering needed resulting in an implementation that requires 20% less area overall. This area savings can be used to improve cost by building smaller chips, to process larger DNNs for the same on-chip area, or to incorporate an additional three execution engines increasing peak performance bandwidth by 18%.

查看原文本刊更多论文

Proteus:利用深度神经网络的数值精度可变性

这项工作利用深度神经网络(dnn)的容忍度来降低精度的数值表示，特别是，它们最近证明了在保持精度的同时容忍每层不同精度的表示的能力。这种灵活性可以改进使用单一统一表示的传统DNN实现。这项工作提出了Proteus，它减少了DNN所需的数据流量和存储空间，从而降低了DNN实现的能量和提高了区域效率。Proteus对dnn处理的数据(神经元)和权重(突触)每层使用不同的表示。Proteus是对现有DNN实现的分层扩展，它在DNN执行引擎使用的数值表示和在向内存(片上缓冲区或片外内存)读写数据值时使用的更短的、特定于层的定点表示之间进行转换。Proteus采用新颖的DNN数据存储布局，实现简单、低成本和低能量的转换单元。我们评估Proteus作为最先进的加速器[7]的扩展，[7]使用统一的16位定点表示。在五种流行的dnn上，Proteus平均减少了43%的层间数据流量，即使与单精度浮点实现相比，准确率也保持在1%以内。结果，Proteus在没有性能损失的情况下提高了15%的能量。Proteus还减少了至少38%的数据占用，因此所需的片上缓冲量使实现所需的总体面积减少了20%。节省的面积可用于通过构建更小的芯片来提高成本，在相同的片上区域处理更大的dnn，或者合并额外的三个执行引擎，将峰值性能带宽提高18%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2016 International Conference on Supercomputing

自引率

0.00%

发文量