{"title":"加速神经网络激活函数在FPGA上的高效实现","authors":"Kai Qian, Yinqiu Liu, Zexu Zhang, Kun Wang","doi":"10.1109/ISCAS46773.2023.10181406","DOIUrl":null,"url":null,"abstract":"In this paper, we present the Integer Lightweight Softmax (ILS) algorithm for approximating the Softmax activation function. The accurate implementation of Softmax on FPGA can be huge resource-intensive and memory-hungry. Then, we present the implementation of ILS on a Xilinx XCKU040 FPGA to evaluate the effectiveness of ILS. Evaluations on CIFAR 10, CIFAR 100 and ImageNet show that ILS achieves up to $2.47\\times, 40\\times$ and $323\\times$ speedup over CPU implementation, and $4\\times, 63\\times$ and $51\\times$ speedup over GPU implementation, respectively. In comparison to previous FPGA-based Softmax implementations, ILS strikes a better balance between resource consumption and precision accuracy.","PeriodicalId":177320,"journal":{"name":"2023 IEEE International Symposium on Circuits and Systems (ISCAS)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient Implementation of Activation Function on FPGA for Accelerating Neural Networks\",\"authors\":\"Kai Qian, Yinqiu Liu, Zexu Zhang, Kun Wang\",\"doi\":\"10.1109/ISCAS46773.2023.10181406\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present the Integer Lightweight Softmax (ILS) algorithm for approximating the Softmax activation function. The accurate implementation of Softmax on FPGA can be huge resource-intensive and memory-hungry. Then, we present the implementation of ILS on a Xilinx XCKU040 FPGA to evaluate the effectiveness of ILS. Evaluations on CIFAR 10, CIFAR 100 and ImageNet show that ILS achieves up to $2.47\\\\times, 40\\\\times$ and $323\\\\times$ speedup over CPU implementation, and $4\\\\times, 63\\\\times$ and $51\\\\times$ speedup over GPU implementation, respectively. In comparison to previous FPGA-based Softmax implementations, ILS strikes a better balance between resource consumption and precision accuracy.\",\"PeriodicalId\":177320,\"journal\":{\"name\":\"2023 IEEE International Symposium on Circuits and Systems (ISCAS)\",\"volume\":\"113 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Symposium on Circuits and Systems (ISCAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCAS46773.2023.10181406\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Symposium on Circuits and Systems (ISCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCAS46773.2023.10181406","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient Implementation of Activation Function on FPGA for Accelerating Neural Networks
In this paper, we present the Integer Lightweight Softmax (ILS) algorithm for approximating the Softmax activation function. The accurate implementation of Softmax on FPGA can be huge resource-intensive and memory-hungry. Then, we present the implementation of ILS on a Xilinx XCKU040 FPGA to evaluate the effectiveness of ILS. Evaluations on CIFAR 10, CIFAR 100 and ImageNet show that ILS achieves up to $2.47\times, 40\times$ and $323\times$ speedup over CPU implementation, and $4\times, 63\times$ and $51\times$ speedup over GPU implementation, respectively. In comparison to previous FPGA-based Softmax implementations, ILS strikes a better balance between resource consumption and precision accuracy.