Efficient CORDIC-Based Activation Functions for RNN Acceleration on FPGAs

IEEE transactions on artificial intelligence Pub Date : 2024-10-07 DOI:10.1109/TAI.2024.3474648

Wan Shen;Junye Jiang;Minghan Li;Shuanglong Liu

{"title":"Efficient CORDIC-Based Activation Functions for RNN Acceleration on FPGAs","authors":"Wan Shen;Junye Jiang;Minghan Li;Shuanglong Liu","doi":"10.1109/TAI.2024.3474648","DOIUrl":null,"url":null,"abstract":"Recurrent neural networks (RNNs), particularly long short-term memory (LSTM) networks, have emerged as standard tools for tackling a wide range of time series applications, such as natural language processing. However, deploying these models on edge devices presents great challenges due to limited computational resources. Additionally, the implementation of RNN activation functions on low-end hardware devices significantly impacts the overall network performance, as activations constitute the dominant part of execution time. In this work, we propose an efficient approach for implementing commonly used RNN activations, leveraging an optimized coordinate rotation digital computer algorithm (CORDIC). Moreover, we propose a unified hardware architecture for mapping the CORDIC-based method onto field-programmable gate arrays (FPGAs), which can be configured to implement multiple nonlinear activation functions. Our architecture reduces the computational time with fewer iterations in CORDIC compared with existing methods, rendering it particularly suitable for resource-constrained edge devices. Our design is implemented on a Xilinx Zynq-7000 device and evaluated across three RNNs and benchmark datasets. Experimental results demonstrate that our design achieves up to a 2<inline-formula><tex-math>$\\boldsymbol{\\times}$</tex-math></inline-formula> speedup while maintaining model accuracy compared with the state-of-the-art designs.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 1","pages":"199-210"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10706602/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recurrent neural networks (RNNs), particularly long short-term memory (LSTM) networks, have emerged as standard tools for tackling a wide range of time series applications, such as natural language processing. However, deploying these models on edge devices presents great challenges due to limited computational resources. Additionally, the implementation of RNN activation functions on low-end hardware devices significantly impacts the overall network performance, as activations constitute the dominant part of execution time. In this work, we propose an efficient approach for implementing commonly used RNN activations, leveraging an optimized coordinate rotation digital computer algorithm (CORDIC). Moreover, we propose a unified hardware architecture for mapping the CORDIC-based method onto field-programmable gate arrays (FPGAs), which can be configured to implement multiple nonlinear activation functions. Our architecture reduces the computational time with fewer iterations in CORDIC compared with existing methods, rendering it particularly suitable for resource-constrained edge devices. Our design is implemented on a Xilinx Zynq-7000 device and evaluated across three RNNs and benchmark datasets. Experimental results demonstrate that our design achieves up to a 2

$\boldsymbol{\times}$

speedup while maintaining model accuracy compared with the state-of-the-art designs.

查看原文本刊更多论文

fpga上基于cordic的RNN加速激活函数

循环神经网络（rnn），特别是长短期记忆（LSTM）网络，已经成为处理广泛时间序列应用的标准工具，例如自然语言处理。然而，由于计算资源有限，在边缘设备上部署这些模型面临着巨大的挑战。此外，在低端硬件设备上实现RNN激活函数会显著影响整体网络性能，因为激活构成了执行时间的主要部分。在这项工作中，我们提出一种有效的方法来实现常用的RNN激活，利用优化的坐标旋转数字计算机算法（CORDIC）。此外，我们提出了一个统一的硬件架构，用于将基于cordic的方法映射到现场可编程门阵列（fpga）上，该fpga可以配置为实现多个非线性激活函数。与现有方法相比，我们的架构减少了CORDIC的计算时间和更少的迭代，使其特别适合资源受限的边缘设备。我们的设计在Xilinx Zynq-7000设备上实现，并在三个rnn和基准数据集上进行评估。实验结果表明，与最先进的设计相比，我们的设计在保持模型精度的同时实现了高达2$\boldsymbol{\times}$的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on artificial intelligence

CiteScore

7.70

自引率

0.00%

发文量