面向深度学习边缘计算应用的低成本低功耗堆叠稀疏自编码器硬件加速

2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP) Pub Date : 2020-09-01 DOI:10.1109/ATSIP49331.2020.9231748

T. Belabed, M. G. Coutinho, Marcelo A. C. Fernandes, C. Valderrama, C. Souani

{"title":"面向深度学习边缘计算应用的低成本低功耗堆叠稀疏自编码器硬件加速","authors":"T. Belabed, M. G. Coutinho, Marcelo A. C. Fernandes, C. Valderrama, C. Souani","doi":"10.1109/ATSIP49331.2020.9231748","DOIUrl":null,"url":null,"abstract":"Nowadays, Deep Learning DL becoming more and more interesting in many areas, such as genomics, security, data analysis, image, and video processing. However, DL requires more and more powerful and parallel computing. The calculation performed by super-machines equipped with powerful processors, such as the latest GPUs. Despite their power, these computing units consume a lot of energy, which makes their use very difficult in small embedded systems and edge computing. To overcome the problem for which we must keep the maximum performance and satisfy the power constraint, it is necessary to use a heterogeneous strategy. Some solutions are promising when using less energyconsuming electronic circuits, such as FPGAs associated with less expensive topologies such as Stacked Sparse Autoencoders. Our target architecture is the Xilinx ZYNQ 7020 SoC, which combines a dual-core ARM processor and an FPGA in the same chip. In the interest of flexibility, we decided to leverage the performance of Xilinx’s high-level synthesis tools, evaluate and choose the best solution in terms of size and performance of the data exchange, synchronization and pipeline processing. The results show that our implementation gives high performance at very low energy consumption. Indeed, the evaluation of our accelerator shows that it can classify 1160 MNIST images per second, consuming only 0.443 W; 2.4 W for the entire system. More than the low energy consumption and the high performance, the platform used only costs $ 125.","PeriodicalId":384018,"journal":{"name":"2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Low Cost and Low Power Stacked Sparse Autoencoder Hardware Acceleration for Deep Learning Edge Computing Applications\",\"authors\":\"T. Belabed, M. G. Coutinho, Marcelo A. C. Fernandes, C. Valderrama, C. Souani\",\"doi\":\"10.1109/ATSIP49331.2020.9231748\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, Deep Learning DL becoming more and more interesting in many areas, such as genomics, security, data analysis, image, and video processing. However, DL requires more and more powerful and parallel computing. The calculation performed by super-machines equipped with powerful processors, such as the latest GPUs. Despite their power, these computing units consume a lot of energy, which makes their use very difficult in small embedded systems and edge computing. To overcome the problem for which we must keep the maximum performance and satisfy the power constraint, it is necessary to use a heterogeneous strategy. Some solutions are promising when using less energyconsuming electronic circuits, such as FPGAs associated with less expensive topologies such as Stacked Sparse Autoencoders. Our target architecture is the Xilinx ZYNQ 7020 SoC, which combines a dual-core ARM processor and an FPGA in the same chip. In the interest of flexibility, we decided to leverage the performance of Xilinx’s high-level synthesis tools, evaluate and choose the best solution in terms of size and performance of the data exchange, synchronization and pipeline processing. The results show that our implementation gives high performance at very low energy consumption. Indeed, the evaluation of our accelerator shows that it can classify 1160 MNIST images per second, consuming only 0.443 W; 2.4 W for the entire system. More than the low energy consumption and the high performance, the platform used only costs $ 125.\",\"PeriodicalId\":384018,\"journal\":{\"name\":\"2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ATSIP49331.2020.9231748\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ATSIP49331.2020.9231748","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

如今，深度学习在基因组学、安全、数据分析、图像和视频处理等许多领域变得越来越有趣。然而，深度学习需要越来越强大的并行计算能力。这些计算是由配备了强大处理器的超级机器完成的，比如最新的gpu。尽管它们很强大，但这些计算单元消耗了大量的能量，这使得它们在小型嵌入式系统和边缘计算中使用非常困难。为了解决既要保证最大性能又要满足功率限制的问题，有必要采用异构策略。当使用能耗更低的电子电路时，一些解决方案是有希望的，例如与较便宜的拓扑(如堆叠稀疏自编码器)相关的fpga。我们的目标架构是赛灵思ZYNQ 7020 SoC，它在同一芯片中结合了双核ARM处理器和FPGA。为了提高灵活性，我们决定利用Xilinx高级合成工具的性能，根据数据交换、同步和管道处理的大小和性能评估并选择最佳解决方案。结果表明，我们的实现在非常低的能耗下实现了高性能。事实上，对我们的加速器的评估表明，它每秒可以分类1160个MNIST图像，仅消耗0.443 W;整个系统2.4 W。除了低能耗和高性能之外，该平台的使用成本仅为125美元。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Low Cost and Low Power Stacked Sparse Autoencoder Hardware Acceleration for Deep Learning Edge Computing Applications

Nowadays, Deep Learning DL becoming more and more interesting in many areas, such as genomics, security, data analysis, image, and video processing. However, DL requires more and more powerful and parallel computing. The calculation performed by super-machines equipped with powerful processors, such as the latest GPUs. Despite their power, these computing units consume a lot of energy, which makes their use very difficult in small embedded systems and edge computing. To overcome the problem for which we must keep the maximum performance and satisfy the power constraint, it is necessary to use a heterogeneous strategy. Some solutions are promising when using less energyconsuming electronic circuits, such as FPGAs associated with less expensive topologies such as Stacked Sparse Autoencoders. Our target architecture is the Xilinx ZYNQ 7020 SoC, which combines a dual-core ARM processor and an FPGA in the same chip. In the interest of flexibility, we decided to leverage the performance of Xilinx’s high-level synthesis tools, evaluate and choose the best solution in terms of size and performance of the data exchange, synchronization and pipeline processing. The results show that our implementation gives high performance at very low energy consumption. Indeed, the evaluation of our accelerator shows that it can classify 1160 MNIST images per second, consuming only 0.443 W; 2.4 W for the entire system. More than the low energy consumption and the high performance, the platform used only costs $ 125.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)

自引率

0.00%

发文量