{"title":"基于FPGA的深度学习模型快速稀疏加速器设计","authors":"Shaotong Li, Yuhang Long","doi":"10.1117/12.2680554","DOIUrl":null,"url":null,"abstract":"At present, there have been many studies to design various CNN hardware accelerators to accelerate the inference of deep neural network models. The FPGA-based CNN reasoning accelerator can provide sufficient computing power support with flexible data accuracy, lower energy consumption and lower application cost, and has received a lot of attention in the application field of IoT terminal devices with limited computing power and energy consumption. Widespread concern. However, although the current FPGA-based CNN accelerator has greatly improved the speed of model reasoning through various methods, most of the methods cannot be effectively applied to actual terminal scenarios due to limitations in memory and energy consumption. In response to this situation, we designed an acceleration framework that takes into account both inference acceleration and energy consumption. Aiming at the limitation of computing power in the terminal environment, optimize a large number of multiplication operations in the convolution operation that consumes the most computing power in the CNN inference stage, by using local cache and matrix transformation formulas, and skipping pairings by zero values in the calculation process the model inference operation is further accelerated while reducing energy consumption. The experimental results show that compared with the current advanced neural network accelerator, not only the computing power has been significantly improved, but also the energy efficiency ratio has achieved better results. Moreover, this method can not only be implemented in FPGA, but also be migrated to other embedded terminals.","PeriodicalId":201466,"journal":{"name":"Symposium on Advances in Electrical, Electronics and Computer Engineering","volume":"23 16","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Design of fast and sparse accelerator for deep learning model based on FPGA\",\"authors\":\"Shaotong Li, Yuhang Long\",\"doi\":\"10.1117/12.2680554\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"At present, there have been many studies to design various CNN hardware accelerators to accelerate the inference of deep neural network models. The FPGA-based CNN reasoning accelerator can provide sufficient computing power support with flexible data accuracy, lower energy consumption and lower application cost, and has received a lot of attention in the application field of IoT terminal devices with limited computing power and energy consumption. Widespread concern. However, although the current FPGA-based CNN accelerator has greatly improved the speed of model reasoning through various methods, most of the methods cannot be effectively applied to actual terminal scenarios due to limitations in memory and energy consumption. In response to this situation, we designed an acceleration framework that takes into account both inference acceleration and energy consumption. Aiming at the limitation of computing power in the terminal environment, optimize a large number of multiplication operations in the convolution operation that consumes the most computing power in the CNN inference stage, by using local cache and matrix transformation formulas, and skipping pairings by zero values in the calculation process the model inference operation is further accelerated while reducing energy consumption. The experimental results show that compared with the current advanced neural network accelerator, not only the computing power has been significantly improved, but also the energy efficiency ratio has achieved better results. Moreover, this method can not only be implemented in FPGA, but also be migrated to other embedded terminals.\",\"PeriodicalId\":201466,\"journal\":{\"name\":\"Symposium on Advances in Electrical, Electronics and Computer Engineering\",\"volume\":\"23 16\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Symposium on Advances in Electrical, Electronics and Computer Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2680554\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Symposium on Advances in Electrical, Electronics and Computer Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2680554","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Design of fast and sparse accelerator for deep learning model based on FPGA
At present, there have been many studies to design various CNN hardware accelerators to accelerate the inference of deep neural network models. The FPGA-based CNN reasoning accelerator can provide sufficient computing power support with flexible data accuracy, lower energy consumption and lower application cost, and has received a lot of attention in the application field of IoT terminal devices with limited computing power and energy consumption. Widespread concern. However, although the current FPGA-based CNN accelerator has greatly improved the speed of model reasoning through various methods, most of the methods cannot be effectively applied to actual terminal scenarios due to limitations in memory and energy consumption. In response to this situation, we designed an acceleration framework that takes into account both inference acceleration and energy consumption. Aiming at the limitation of computing power in the terminal environment, optimize a large number of multiplication operations in the convolution operation that consumes the most computing power in the CNN inference stage, by using local cache and matrix transformation formulas, and skipping pairings by zero values in the calculation process the model inference operation is further accelerated while reducing energy consumption. The experimental results show that compared with the current advanced neural network accelerator, not only the computing power has been significantly improved, but also the energy efficiency ratio has achieved better results. Moreover, this method can not only be implemented in FPGA, but also be migrated to other embedded terminals.