使用 FPGA 的深度神经网络实时 P-SFA 硬件实现

IF 1.9 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Microprocessors and Microsystems Pub Date : 2024-02-17 DOI:10.1016/j.micpro.2024.105037

Nour Elshahawy , Sandy A. Wasif , Maggie Mashaly , Eman Azab

{"title":"使用 FPGA 的深度神经网络实时 P-SFA 硬件实现","authors":"Nour Elshahawy , Sandy A. Wasif , Maggie Mashaly , Eman Azab","doi":"10.1016/j.micpro.2024.105037","DOIUrl":null,"url":null,"abstract":"<div><p>Machine Learning (ML) algorithms, specifically Artificial Neural Networks (ANNs), have proved their effectiveness in solving complex problems in many different applications and multiple fields. This paper focuses on optimizing the activation function (AF) block of the NN hardware architecture. The AF block used is based on a probability-based sigmoid function approximation block (P-SFA) combined with a novel real-time probability module (PRT) that calculates the probability of the input data. The proposed NN design aims to use the least amount of hardware resources and area while maintaining a high recognition accuracy. The proposed AF module in this work consists of two P-SFA blocks and the PRT component. The architecture proposed for implementing NNs is evaluated on Field Programmable Gate Arrays (FPGAs). The proposed design has achieved a recognition accuracy of 97.84 % on a 6-layer Deep Neural Network (DNN) for the MNIST dataset and a recognition accuracy of 88.58% on a 6-layer DNN for the FMNIST dataset. The proposed AF module has a total area of 1136 LUTs and 327 FFs, a logical critical path delay of 8.853 ns. The power consumption of the P-SFA block is 6 mW and the PRT block is 5 mW.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105037"},"PeriodicalIF":1.9000,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Real-time P-SFA hardware implementation of Deep Neural Networks using FPGA\",\"authors\":\"Nour Elshahawy , Sandy A. Wasif , Maggie Mashaly , Eman Azab\",\"doi\":\"10.1016/j.micpro.2024.105037\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Machine Learning (ML) algorithms, specifically Artificial Neural Networks (ANNs), have proved their effectiveness in solving complex problems in many different applications and multiple fields. This paper focuses on optimizing the activation function (AF) block of the NN hardware architecture. The AF block used is based on a probability-based sigmoid function approximation block (P-SFA) combined with a novel real-time probability module (PRT) that calculates the probability of the input data. The proposed NN design aims to use the least amount of hardware resources and area while maintaining a high recognition accuracy. The proposed AF module in this work consists of two P-SFA blocks and the PRT component. The architecture proposed for implementing NNs is evaluated on Field Programmable Gate Arrays (FPGAs). The proposed design has achieved a recognition accuracy of 97.84 % on a 6-layer Deep Neural Network (DNN) for the MNIST dataset and a recognition accuracy of 88.58% on a 6-layer DNN for the FMNIST dataset. The proposed AF module has a total area of 1136 LUTs and 327 FFs, a logical critical path delay of 8.853 ns. The power consumption of the P-SFA block is 6 mW and the PRT block is 5 mW.</p></div>\",\"PeriodicalId\":49815,\"journal\":{\"name\":\"Microprocessors and Microsystems\",\"volume\":\"106 \",\"pages\":\"Article 105037\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2024-02-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Microprocessors and Microsystems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0141933124000322\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microprocessors and Microsystems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141933124000322","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

机器学习（ML）算法，特别是人工神经网络（ANN），已证明能有效解决许多不同应用和多个领域的复杂问题。本文的重点是优化 NN 硬件架构中的激活函数 (AF) 模块。所使用的激活函数块基于基于概率的 sigmoid 函数近似块 (P-SFA)，并与计算输入数据概率的新型实时概率模块 (PRT) 相结合。拟议的 NN 设计旨在使用最少的硬件资源和面积，同时保持较高的识别准确率。这项工作中提出的自动指纹识别模块由两个 P-SFA 模块和 PRT 组件组成。在现场可编程门阵列（FPGA）上评估了为实现 NN 而提出的架构。在 MNIST 数据集上，拟议设计的 6 层深度神经网络（DNN）的识别准确率达到 97.84%，在 FMNIST 数据集上，拟议设计的 6 层深度神经网络的识别准确率达到 88.58%。拟议的自动指纹识别模块的总面积为 1136 个 LUT 和 327 个 FF，逻辑关键路径延迟为 8.853 ns。P-SFA 模块的功耗为 6 mW，PRT 模块的功耗为 5 mW。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Real-time P-SFA hardware implementation of Deep Neural Networks using FPGA

Machine Learning (ML) algorithms, specifically Artificial Neural Networks (ANNs), have proved their effectiveness in solving complex problems in many different applications and multiple fields. This paper focuses on optimizing the activation function (AF) block of the NN hardware architecture. The AF block used is based on a probability-based sigmoid function approximation block (P-SFA) combined with a novel real-time probability module (PRT) that calculates the probability of the input data. The proposed NN design aims to use the least amount of hardware resources and area while maintaining a high recognition accuracy. The proposed AF module in this work consists of two P-SFA blocks and the PRT component. The architecture proposed for implementing NNs is evaluated on Field Programmable Gate Arrays (FPGAs). The proposed design has achieved a recognition accuracy of 97.84 % on a 6-layer Deep Neural Network (DNN) for the MNIST dataset and a recognition accuracy of 88.58% on a 6-layer DNN for the FMNIST dataset. The proposed AF module has a total area of 1136 LUTs and 327 FFs, a logical critical path delay of 8.853 ns. The power consumption of the P-SFA block is 6 mW and the PRT block is 5 mW.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Microprocessors and Microsystems 工程技术-工程：电子与电气

CiteScore

6.90

自引率

3.80%

发文量

204

审稿时长

172 days

期刊介绍： Microprocessors and Microsystems: Embedded Hardware Design (MICPRO) is a journal covering all design and architectural aspects related to embedded systems hardware. This includes different embedded system hardware platforms ranging from custom hardware via reconfigurable systems and application specific processors to general purpose embedded processors. Special emphasis is put on novel complex embedded architectures, such as systems on chip (SoC), systems on a programmable/reconfigurable chip (SoPC) and multi-processor systems on a chip (MPSoC), as well as, their memory and communication methods and structures, such as network-on-chip (NoC). Design automation of such systems including methodologies, techniques, flows and tools for their design, as well as, novel designs of hardware components fall within the scope of this journal. Novel cyber-physical applications that use embedded systems are also central in this journal. While software is not in the main focus of this journal, methods of hardware/software co-design, as well as, application restructuring and mapping to embedded hardware platforms, that consider interplay between software and hardware components with emphasis on hardware, are also in the journal scope.