A High-Precision Flexible Symmetry-Aware Architecture for Element-Wise Activation Functions

2021 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2021-12-06 DOI:10.1109/ICFPT52863.2021.9609865

Xuan Feng, Yue Li, Yu Qian, Jingbo Gao, Wei Cao, Lingli Wang

{"title":"A High-Precision Flexible Symmetry-Aware Architecture for Element-Wise Activation Functions","authors":"Xuan Feng, Yue Li, Yu Qian, Jingbo Gao, Wei Cao, Lingli Wang","doi":"10.1109/ICFPT52863.2021.9609865","DOIUrl":null,"url":null,"abstract":"Nonlinear activation functions (NAFs) play an essential role in deep neural networks (DNNs). Since versatile DNN accelerators need to support various DNNs which contain different NAFs, the flexible hardware design supporting those NAFs has become crucial. However, there are few high-precision flexible hardware architectures, and the symmetries of different NAFs have not been fully studied. This paper proposes a high-precision symmetry-aware architecture based on piecewise linear approximation. Through the reconfigurable data path, the architecture can support various typical NAFs. The efficient non-uniform segmentation scheme is proposed to achieve high precision for each NAF. Besides, the utilization of unified symmetry for NAFs can save half the memory. To reduce the computational cost, a 25×18 DSP is shared by two INT 7×9 multipliers with two independent inputs. The architecture is implemented on Xilinx ZC706 at a frequency of 410MHz. Compared with the state-of-the-art flexible nonlinear core, our flexible architecture costs fewer hardware resources with higher precision. Applying the design to BERT-BASE, MobileNetV3, and EfficientNet-B3 on the PyTorch platform, experimental results show that the accuracy loss is either 0 for BERT-BASE, or 0.002% for EfficientNet-B3. For MobileNetV3, the accuracy is even improved by 0.01%.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT52863.2021.9609865","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Nonlinear activation functions (NAFs) play an essential role in deep neural networks (DNNs). Since versatile DNN accelerators need to support various DNNs which contain different NAFs, the flexible hardware design supporting those NAFs has become crucial. However, there are few high-precision flexible hardware architectures, and the symmetries of different NAFs have not been fully studied. This paper proposes a high-precision symmetry-aware architecture based on piecewise linear approximation. Through the reconfigurable data path, the architecture can support various typical NAFs. The efficient non-uniform segmentation scheme is proposed to achieve high precision for each NAF. Besides, the utilization of unified symmetry for NAFs can save half the memory. To reduce the computational cost, a 25×18 DSP is shared by two INT 7×9 multipliers with two independent inputs. The architecture is implemented on Xilinx ZC706 at a frequency of 410MHz. Compared with the state-of-the-art flexible nonlinear core, our flexible architecture costs fewer hardware resources with higher precision. Applying the design to BERT-BASE, MobileNetV3, and EfficientNet-B3 on the PyTorch platform, experimental results show that the accuracy loss is either 0 for BERT-BASE, or 0.002% for EfficientNet-B3. For MobileNetV3, the accuracy is even improved by 0.01%.

查看原文本刊更多论文

面向元素激活函数的高精度灵活对称感知体系结构

非线性激活函数(NAFs)在深度神经网络中起着至关重要的作用。由于通用DNN加速器需要支持包含不同NAFs的各种DNN，因此支持这些NAFs的灵活硬件设计变得至关重要。然而，目前高精度柔性硬件体系结构很少，不同柔性硬件的对称性也没有得到充分的研究。提出了一种基于分段线性逼近的高精度对称感知结构。通过可重构的数据路径，该体系结构可以支持各种典型的NAFs。提出了一种高效的非均匀分割方案，以实现对每个NAF的高精度分割。此外，使用统一对称的naf可以节省一半的内存。为了降低计算成本，一个25×18 DSP由两个具有两个独立输入的INT 7×9乘法器共享。该架构在Xilinx ZC706上实现，频率为410MHz。与目前最先进的柔性非线性核心相比，我们的柔性架构消耗的硬件资源更少，精度更高。将该设计应用于PyTorch平台上的BERT-BASE、MobileNetV3和effentnet - b3，实验结果表明BERT-BASE的精度损失为0,effentnet - b3的精度损失为0.002%。对于MobileNetV3，准确率甚至提高了0.01%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 International Conference on Field-Programmable Technology (ICFPT)

自引率

0.00%

发文量