Reconfigurable Stream-based Tensor Unit with Variable-Precision Posit Arithmetic

2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2020-07-01 DOI:10.1109/ASAP49362.2020.00033

Nuno Neves, P. Tomás, N. Roma

{"title":"Reconfigurable Stream-based Tensor Unit with Variable-Precision Posit Arithmetic","authors":"Nuno Neves, P. Tomás, N. Roma","doi":"10.1109/ASAP49362.2020.00033","DOIUrl":null,"url":null,"abstract":"The increased adoption of DNN applications drove the emergence of dedicated tensor computing units to accelerate multi-dimensional matrix multiplication operations. Although they deploy highly efficient computing architectures, they often lack support for more general-purpose application domains. Such a limitation occurs both due to their consolidated computation scheme (restricted to matrix multiplication) and due to their frequent adoption of low-precision/custom floating-point formats (unsuited for general application domains). In contrast, this paper proposes a new Reconfigurable Tensor Unit (RTU) which deploys an array of variable-precision Vector MultiplyAccumulate (VMA) units. Furthermore, each VMA unit leverages the new Posit floating-point format and supports the full range of standardized posit precisions in a single SIMD unit, with variable vector-element width. Moreover, the proposed RTU explores the Posit format features for fused operations, together with spatial and time-multiplexing reconfiguration mechanisms to fuse and combine multiple VMAs to map high-level and complex operations. The RTU is also supported by an automatic data streaming infrastructure and a pipelined data movement scheme, allowing it to accelerate the computation of most data-parallel patterns commonly present in vectorizable applications. The proposed RTU showed to outperform state-of-the-art tensor and SIMD units, present in off-the-shelf platforms, in turn resulting in significant energy-efficiency improvements.","PeriodicalId":375691,"journal":{"name":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAP49362.2020.00033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

The increased adoption of DNN applications drove the emergence of dedicated tensor computing units to accelerate multi-dimensional matrix multiplication operations. Although they deploy highly efficient computing architectures, they often lack support for more general-purpose application domains. Such a limitation occurs both due to their consolidated computation scheme (restricted to matrix multiplication) and due to their frequent adoption of low-precision/custom floating-point formats (unsuited for general application domains). In contrast, this paper proposes a new Reconfigurable Tensor Unit (RTU) which deploys an array of variable-precision Vector MultiplyAccumulate (VMA) units. Furthermore, each VMA unit leverages the new Posit floating-point format and supports the full range of standardized posit precisions in a single SIMD unit, with variable vector-element width. Moreover, the proposed RTU explores the Posit format features for fused operations, together with spatial and time-multiplexing reconfiguration mechanisms to fuse and combine multiple VMAs to map high-level and complex operations. The RTU is also supported by an automatic data streaming infrastructure and a pipelined data movement scheme, allowing it to accelerate the computation of most data-parallel patterns commonly present in vectorizable applications. The proposed RTU showed to outperform state-of-the-art tensor and SIMD units, present in off-the-shelf platforms, in turn resulting in significant energy-efficiency improvements.

查看原文本刊更多论文

可变精度位置算法的可重构流张量单元

深度神经网络应用的日益普及推动了专用张量计算单元的出现，以加速多维矩阵乘法运算。尽管它们部署了高效的计算体系结构，但它们通常缺乏对更通用的应用程序域的支持。这种限制是由于它们的统一计算方案(仅限于矩阵乘法)和它们经常采用低精度/自定义浮点格式(不适合一般应用领域)造成的。相比之下，本文提出了一种新的可重构张量单元(RTU)，它部署了一组可变精度的向量乘法累加(VMA)单元。此外，每个VMA单元都利用新的Posit浮点格式，并在单个SIMD单元中支持所有标准化的位置精度，具有可变的矢量元素宽度。此外，所提出的RTU探索了融合操作的Posit格式特征，以及空间和时间复用重构机制，以融合和组合多个vma以映射高级和复杂的操作。RTU还由自动数据流基础设施和流水线数据移动方案支持，允许它加速可向量化应用程序中常见的大多数数据并行模式的计算。RTU的性能优于现有平台中最先进的张量单元和SIMD单元，从而显著提高了能源效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)

自引率

0.00%

发文量