Nour Elshahawy , Sandy A. Wasif , Maggie Mashaly , Eman Azab
{"title":"使用 FPGA 的深度神经网络实时 P-SFA 硬件实现","authors":"Nour Elshahawy , Sandy A. Wasif , Maggie Mashaly , Eman Azab","doi":"10.1016/j.micpro.2024.105037","DOIUrl":null,"url":null,"abstract":"<div><p>Machine Learning (ML) algorithms, specifically Artificial Neural Networks (ANNs), have proved their effectiveness in solving complex problems in many different applications and multiple fields. This paper focuses on optimizing the activation function (AF) block of the NN hardware architecture. The AF block used is based on a probability-based sigmoid function approximation block (P-SFA) combined with a novel real-time probability module (PRT) that calculates the probability of the input data. The proposed NN design aims to use the least amount of hardware resources and area while maintaining a high recognition accuracy. The proposed AF module in this work consists of two P-SFA blocks and the PRT component. The architecture proposed for implementing NNs is evaluated on Field Programmable Gate Arrays (FPGAs). The proposed design has achieved a recognition accuracy of 97.84 % on a 6-layer Deep Neural Network (DNN) for the MNIST dataset and a recognition accuracy of 88.58% on a 6-layer DNN for the FMNIST dataset. The proposed AF module has a total area of 1136 LUTs and 327 FFs, a logical critical path delay of 8.853 ns. The power consumption of the P-SFA block is 6 mW and the PRT block is 5 mW.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"106 ","pages":"Article 105037"},"PeriodicalIF":1.9000,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Real-time P-SFA hardware implementation of Deep Neural Networks using FPGA\",\"authors\":\"Nour Elshahawy , Sandy A. Wasif , Maggie Mashaly , Eman Azab\",\"doi\":\"10.1016/j.micpro.2024.105037\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Machine Learning (ML) algorithms, specifically Artificial Neural Networks (ANNs), have proved their effectiveness in solving complex problems in many different applications and multiple fields. This paper focuses on optimizing the activation function (AF) block of the NN hardware architecture. The AF block used is based on a probability-based sigmoid function approximation block (P-SFA) combined with a novel real-time probability module (PRT) that calculates the probability of the input data. The proposed NN design aims to use the least amount of hardware resources and area while maintaining a high recognition accuracy. The proposed AF module in this work consists of two P-SFA blocks and the PRT component. The architecture proposed for implementing NNs is evaluated on Field Programmable Gate Arrays (FPGAs). The proposed design has achieved a recognition accuracy of 97.84 % on a 6-layer Deep Neural Network (DNN) for the MNIST dataset and a recognition accuracy of 88.58% on a 6-layer DNN for the FMNIST dataset. The proposed AF module has a total area of 1136 LUTs and 327 FFs, a logical critical path delay of 8.853 ns. The power consumption of the P-SFA block is 6 mW and the PRT block is 5 mW.</p></div>\",\"PeriodicalId\":49815,\"journal\":{\"name\":\"Microprocessors and Microsystems\",\"volume\":\"106 \",\"pages\":\"Article 105037\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2024-02-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Microprocessors and Microsystems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0141933124000322\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microprocessors and Microsystems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141933124000322","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
A Real-time P-SFA hardware implementation of Deep Neural Networks using FPGA
Machine Learning (ML) algorithms, specifically Artificial Neural Networks (ANNs), have proved their effectiveness in solving complex problems in many different applications and multiple fields. This paper focuses on optimizing the activation function (AF) block of the NN hardware architecture. The AF block used is based on a probability-based sigmoid function approximation block (P-SFA) combined with a novel real-time probability module (PRT) that calculates the probability of the input data. The proposed NN design aims to use the least amount of hardware resources and area while maintaining a high recognition accuracy. The proposed AF module in this work consists of two P-SFA blocks and the PRT component. The architecture proposed for implementing NNs is evaluated on Field Programmable Gate Arrays (FPGAs). The proposed design has achieved a recognition accuracy of 97.84 % on a 6-layer Deep Neural Network (DNN) for the MNIST dataset and a recognition accuracy of 88.58% on a 6-layer DNN for the FMNIST dataset. The proposed AF module has a total area of 1136 LUTs and 327 FFs, a logical critical path delay of 8.853 ns. The power consumption of the P-SFA block is 6 mW and the PRT block is 5 mW.
期刊介绍:
Microprocessors and Microsystems: Embedded Hardware Design (MICPRO) is a journal covering all design and architectural aspects related to embedded systems hardware. This includes different embedded system hardware platforms ranging from custom hardware via reconfigurable systems and application specific processors to general purpose embedded processors. Special emphasis is put on novel complex embedded architectures, such as systems on chip (SoC), systems on a programmable/reconfigurable chip (SoPC) and multi-processor systems on a chip (MPSoC), as well as, their memory and communication methods and structures, such as network-on-chip (NoC).
Design automation of such systems including methodologies, techniques, flows and tools for their design, as well as, novel designs of hardware components fall within the scope of this journal. Novel cyber-physical applications that use embedded systems are also central in this journal. While software is not in the main focus of this journal, methods of hardware/software co-design, as well as, application restructuring and mapping to embedded hardware platforms, that consider interplay between software and hardware components with emphasis on hardware, are also in the journal scope.