Low Cost and Low Power Stacked Sparse Autoencoder Hardware Acceleration for Deep Learning Edge Computing Applications

T. Belabed, M. G. Coutinho, Marcelo A. C. Fernandes, C. Valderrama, C. Souani
{"title":"Low Cost and Low Power Stacked Sparse Autoencoder Hardware Acceleration for Deep Learning Edge Computing Applications","authors":"T. Belabed, M. G. Coutinho, Marcelo A. C. Fernandes, C. Valderrama, C. Souani","doi":"10.1109/ATSIP49331.2020.9231748","DOIUrl":null,"url":null,"abstract":"Nowadays, Deep Learning DL becoming more and more interesting in many areas, such as genomics, security, data analysis, image, and video processing. However, DL requires more and more powerful and parallel computing. The calculation performed by super-machines equipped with powerful processors, such as the latest GPUs. Despite their power, these computing units consume a lot of energy, which makes their use very difficult in small embedded systems and edge computing. To overcome the problem for which we must keep the maximum performance and satisfy the power constraint, it is necessary to use a heterogeneous strategy. Some solutions are promising when using less energyconsuming electronic circuits, such as FPGAs associated with less expensive topologies such as Stacked Sparse Autoencoders. Our target architecture is the Xilinx ZYNQ 7020 SoC, which combines a dual-core ARM processor and an FPGA in the same chip. In the interest of flexibility, we decided to leverage the performance of Xilinx’s high-level synthesis tools, evaluate and choose the best solution in terms of size and performance of the data exchange, synchronization and pipeline processing. The results show that our implementation gives high performance at very low energy consumption. Indeed, the evaluation of our accelerator shows that it can classify 1160 MNIST images per second, consuming only 0.443 W; 2.4 W for the entire system. More than the low energy consumption and the high performance, the platform used only costs $ 125.","PeriodicalId":384018,"journal":{"name":"2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ATSIP49331.2020.9231748","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Nowadays, Deep Learning DL becoming more and more interesting in many areas, such as genomics, security, data analysis, image, and video processing. However, DL requires more and more powerful and parallel computing. The calculation performed by super-machines equipped with powerful processors, such as the latest GPUs. Despite their power, these computing units consume a lot of energy, which makes their use very difficult in small embedded systems and edge computing. To overcome the problem for which we must keep the maximum performance and satisfy the power constraint, it is necessary to use a heterogeneous strategy. Some solutions are promising when using less energyconsuming electronic circuits, such as FPGAs associated with less expensive topologies such as Stacked Sparse Autoencoders. Our target architecture is the Xilinx ZYNQ 7020 SoC, which combines a dual-core ARM processor and an FPGA in the same chip. In the interest of flexibility, we decided to leverage the performance of Xilinx’s high-level synthesis tools, evaluate and choose the best solution in terms of size and performance of the data exchange, synchronization and pipeline processing. The results show that our implementation gives high performance at very low energy consumption. Indeed, the evaluation of our accelerator shows that it can classify 1160 MNIST images per second, consuming only 0.443 W; 2.4 W for the entire system. More than the low energy consumption and the high performance, the platform used only costs $ 125.
面向深度学习边缘计算应用的低成本低功耗堆叠稀疏自编码器硬件加速
如今,深度学习在基因组学、安全、数据分析、图像和视频处理等许多领域变得越来越有趣。然而,深度学习需要越来越强大的并行计算能力。这些计算是由配备了强大处理器的超级机器完成的,比如最新的gpu。尽管它们很强大,但这些计算单元消耗了大量的能量,这使得它们在小型嵌入式系统和边缘计算中使用非常困难。为了解决既要保证最大性能又要满足功率限制的问题,有必要采用异构策略。当使用能耗更低的电子电路时,一些解决方案是有希望的,例如与较便宜的拓扑(如堆叠稀疏自编码器)相关的fpga。我们的目标架构是赛灵思ZYNQ 7020 SoC,它在同一芯片中结合了双核ARM处理器和FPGA。为了提高灵活性,我们决定利用Xilinx高级合成工具的性能,根据数据交换、同步和管道处理的大小和性能评估并选择最佳解决方案。结果表明,我们的实现在非常低的能耗下实现了高性能。事实上,对我们的加速器的评估表明,它每秒可以分类1160个MNIST图像,仅消耗0.443 W;整个系统2.4 W。除了低能耗和高性能之外,该平台的使用成本仅为125美元。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信