A High-Precision Flexible Symmetry-Aware Architecture for Element-Wise Activation Functions

Xuan Feng, Yue Li, Yu Qian, Jingbo Gao, Wei Cao, Lingli Wang
{"title":"A High-Precision Flexible Symmetry-Aware Architecture for Element-Wise Activation Functions","authors":"Xuan Feng, Yue Li, Yu Qian, Jingbo Gao, Wei Cao, Lingli Wang","doi":"10.1109/ICFPT52863.2021.9609865","DOIUrl":null,"url":null,"abstract":"Nonlinear activation functions (NAFs) play an essential role in deep neural networks (DNNs). Since versatile DNN accelerators need to support various DNNs which contain different NAFs, the flexible hardware design supporting those NAFs has become crucial. However, there are few high-precision flexible hardware architectures, and the symmetries of different NAFs have not been fully studied. This paper proposes a high-precision symmetry-aware architecture based on piecewise linear approximation. Through the reconfigurable data path, the architecture can support various typical NAFs. The efficient non-uniform segmentation scheme is proposed to achieve high precision for each NAF. Besides, the utilization of unified symmetry for NAFs can save half the memory. To reduce the computational cost, a 25×18 DSP is shared by two INT 7×9 multipliers with two independent inputs. The architecture is implemented on Xilinx ZC706 at a frequency of 410MHz. Compared with the state-of-the-art flexible nonlinear core, our flexible architecture costs fewer hardware resources with higher precision. Applying the design to BERT-BASE, MobileNetV3, and EfficientNet-B3 on the PyTorch platform, experimental results show that the accuracy loss is either 0 for BERT-BASE, or 0.002% for EfficientNet-B3. For MobileNetV3, the accuracy is even improved by 0.01%.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT52863.2021.9609865","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Nonlinear activation functions (NAFs) play an essential role in deep neural networks (DNNs). Since versatile DNN accelerators need to support various DNNs which contain different NAFs, the flexible hardware design supporting those NAFs has become crucial. However, there are few high-precision flexible hardware architectures, and the symmetries of different NAFs have not been fully studied. This paper proposes a high-precision symmetry-aware architecture based on piecewise linear approximation. Through the reconfigurable data path, the architecture can support various typical NAFs. The efficient non-uniform segmentation scheme is proposed to achieve high precision for each NAF. Besides, the utilization of unified symmetry for NAFs can save half the memory. To reduce the computational cost, a 25×18 DSP is shared by two INT 7×9 multipliers with two independent inputs. The architecture is implemented on Xilinx ZC706 at a frequency of 410MHz. Compared with the state-of-the-art flexible nonlinear core, our flexible architecture costs fewer hardware resources with higher precision. Applying the design to BERT-BASE, MobileNetV3, and EfficientNet-B3 on the PyTorch platform, experimental results show that the accuracy loss is either 0 for BERT-BASE, or 0.002% for EfficientNet-B3. For MobileNetV3, the accuracy is even improved by 0.01%.
面向元素激活函数的高精度灵活对称感知体系结构
非线性激活函数(NAFs)在深度神经网络中起着至关重要的作用。由于通用DNN加速器需要支持包含不同NAFs的各种DNN,因此支持这些NAFs的灵活硬件设计变得至关重要。然而,目前高精度柔性硬件体系结构很少,不同柔性硬件的对称性也没有得到充分的研究。提出了一种基于分段线性逼近的高精度对称感知结构。通过可重构的数据路径,该体系结构可以支持各种典型的NAFs。提出了一种高效的非均匀分割方案,以实现对每个NAF的高精度分割。此外,使用统一对称的naf可以节省一半的内存。为了降低计算成本,一个25×18 DSP由两个具有两个独立输入的INT 7×9乘法器共享。该架构在Xilinx ZC706上实现,频率为410MHz。与目前最先进的柔性非线性核心相比,我们的柔性架构消耗的硬件资源更少,精度更高。将该设计应用于PyTorch平台上的BERT-BASE、MobileNetV3和effentnet - b3,实验结果表明BERT-BASE的精度损失为0,effentnet - b3的精度损失为0.002%。对于MobileNetV3,准确率甚至提高了0.01%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信