T. Belabed, M. G. Coutinho, Marcelo A. C. Fernandes, C. Valderrama, C. Souani
{"title":"面向深度学习边缘计算应用的低成本低功耗堆叠稀疏自编码器硬件加速","authors":"T. Belabed, M. G. Coutinho, Marcelo A. C. Fernandes, C. Valderrama, C. Souani","doi":"10.1109/ATSIP49331.2020.9231748","DOIUrl":null,"url":null,"abstract":"Nowadays, Deep Learning DL becoming more and more interesting in many areas, such as genomics, security, data analysis, image, and video processing. However, DL requires more and more powerful and parallel computing. The calculation performed by super-machines equipped with powerful processors, such as the latest GPUs. Despite their power, these computing units consume a lot of energy, which makes their use very difficult in small embedded systems and edge computing. To overcome the problem for which we must keep the maximum performance and satisfy the power constraint, it is necessary to use a heterogeneous strategy. Some solutions are promising when using less energyconsuming electronic circuits, such as FPGAs associated with less expensive topologies such as Stacked Sparse Autoencoders. Our target architecture is the Xilinx ZYNQ 7020 SoC, which combines a dual-core ARM processor and an FPGA in the same chip. In the interest of flexibility, we decided to leverage the performance of Xilinx’s high-level synthesis tools, evaluate and choose the best solution in terms of size and performance of the data exchange, synchronization and pipeline processing. The results show that our implementation gives high performance at very low energy consumption. Indeed, the evaluation of our accelerator shows that it can classify 1160 MNIST images per second, consuming only 0.443 W; 2.4 W for the entire system. More than the low energy consumption and the high performance, the platform used only costs $ 125.","PeriodicalId":384018,"journal":{"name":"2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Low Cost and Low Power Stacked Sparse Autoencoder Hardware Acceleration for Deep Learning Edge Computing Applications\",\"authors\":\"T. Belabed, M. G. Coutinho, Marcelo A. C. Fernandes, C. Valderrama, C. Souani\",\"doi\":\"10.1109/ATSIP49331.2020.9231748\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, Deep Learning DL becoming more and more interesting in many areas, such as genomics, security, data analysis, image, and video processing. However, DL requires more and more powerful and parallel computing. The calculation performed by super-machines equipped with powerful processors, such as the latest GPUs. Despite their power, these computing units consume a lot of energy, which makes their use very difficult in small embedded systems and edge computing. To overcome the problem for which we must keep the maximum performance and satisfy the power constraint, it is necessary to use a heterogeneous strategy. Some solutions are promising when using less energyconsuming electronic circuits, such as FPGAs associated with less expensive topologies such as Stacked Sparse Autoencoders. Our target architecture is the Xilinx ZYNQ 7020 SoC, which combines a dual-core ARM processor and an FPGA in the same chip. In the interest of flexibility, we decided to leverage the performance of Xilinx’s high-level synthesis tools, evaluate and choose the best solution in terms of size and performance of the data exchange, synchronization and pipeline processing. The results show that our implementation gives high performance at very low energy consumption. Indeed, the evaluation of our accelerator shows that it can classify 1160 MNIST images per second, consuming only 0.443 W; 2.4 W for the entire system. More than the low energy consumption and the high performance, the platform used only costs $ 125.\",\"PeriodicalId\":384018,\"journal\":{\"name\":\"2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ATSIP49331.2020.9231748\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ATSIP49331.2020.9231748","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Low Cost and Low Power Stacked Sparse Autoencoder Hardware Acceleration for Deep Learning Edge Computing Applications
Nowadays, Deep Learning DL becoming more and more interesting in many areas, such as genomics, security, data analysis, image, and video processing. However, DL requires more and more powerful and parallel computing. The calculation performed by super-machines equipped with powerful processors, such as the latest GPUs. Despite their power, these computing units consume a lot of energy, which makes their use very difficult in small embedded systems and edge computing. To overcome the problem for which we must keep the maximum performance and satisfy the power constraint, it is necessary to use a heterogeneous strategy. Some solutions are promising when using less energyconsuming electronic circuits, such as FPGAs associated with less expensive topologies such as Stacked Sparse Autoencoders. Our target architecture is the Xilinx ZYNQ 7020 SoC, which combines a dual-core ARM processor and an FPGA in the same chip. In the interest of flexibility, we decided to leverage the performance of Xilinx’s high-level synthesis tools, evaluate and choose the best solution in terms of size and performance of the data exchange, synchronization and pipeline processing. The results show that our implementation gives high performance at very low energy consumption. Indeed, the evaluation of our accelerator shows that it can classify 1160 MNIST images per second, consuming only 0.443 W; 2.4 W for the entire system. More than the low energy consumption and the high performance, the platform used only costs $ 125.