基于cnn的目标检测加速器低功耗FPGA-SoC设计技术

Heekyung Kim, K. Choi
{"title":"基于cnn的目标检测加速器低功耗FPGA-SoC设计技术","authors":"Heekyung Kim, K. Choi","doi":"10.1109/UEMCON47517.2019.8992929","DOIUrl":null,"url":null,"abstract":"This paper shows the possibility of the existing low power register transfer level (RTL) techniques can be effective as a low power design scheme for CNN-based object recognition system acceleration in contrast to conventional techniques. Most power efficient design techniques regarding CNN acceleration are focused on the High-level Synthesis (HLS) aspect, such as memory bandwidth optimization, network architecture reconfiguration, data reuse, and batch normalization. However, these attempts have reached the limits of the effectiveness of itself. Using the post-synthesis RTL code generated by field-programmable gate arrays (FPGA) manufacturers, the proposed RTL low power design technique was applied to the original FIFO part for reducing the power consumption during data transformation. We compared the HLS optimized result with the RTL optimized result in the aspect of power consumption. We configured the testbench for the modified FIFO module and analyzed the estimated power dissipation result. These power effectiveness factors, such as a look-up table (LUT), a lookup table RAM (LUTRAM), can reduce the power dissipation by 54%, 49% respectively, even though increased block RAM (BRAM) leads to the elevated power dissipation by 154%. Thus, the total power consumption was able to be decreased by 10%. This paper discusses two factors of FPGA with system-on-chip (FPGA-SoC) design for CNN-based hardware implementation in power consumption aspect, such as RTL architecture, memory design architecture, and the model architecture-based hardware implementation methods. The virtual additional memory can support the high throughput at full speed. Our simulated low power schemes applied to Processing System (PS) and Programmable Logic (PL) architecture effectively reduced the power consumption by 25.9% in the FIFO data transformation. We established that the increased LUT blocks affect the power-efficient rate and reduce the power consumption of the PL design up to 49%.","PeriodicalId":187022,"journal":{"name":"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","volume":"166 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Low Power FPGA-SoC Design Techniques for CNN-based Object Detection Accelerator\",\"authors\":\"Heekyung Kim, K. Choi\",\"doi\":\"10.1109/UEMCON47517.2019.8992929\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper shows the possibility of the existing low power register transfer level (RTL) techniques can be effective as a low power design scheme for CNN-based object recognition system acceleration in contrast to conventional techniques. Most power efficient design techniques regarding CNN acceleration are focused on the High-level Synthesis (HLS) aspect, such as memory bandwidth optimization, network architecture reconfiguration, data reuse, and batch normalization. However, these attempts have reached the limits of the effectiveness of itself. Using the post-synthesis RTL code generated by field-programmable gate arrays (FPGA) manufacturers, the proposed RTL low power design technique was applied to the original FIFO part for reducing the power consumption during data transformation. We compared the HLS optimized result with the RTL optimized result in the aspect of power consumption. We configured the testbench for the modified FIFO module and analyzed the estimated power dissipation result. These power effectiveness factors, such as a look-up table (LUT), a lookup table RAM (LUTRAM), can reduce the power dissipation by 54%, 49% respectively, even though increased block RAM (BRAM) leads to the elevated power dissipation by 154%. Thus, the total power consumption was able to be decreased by 10%. This paper discusses two factors of FPGA with system-on-chip (FPGA-SoC) design for CNN-based hardware implementation in power consumption aspect, such as RTL architecture, memory design architecture, and the model architecture-based hardware implementation methods. The virtual additional memory can support the high throughput at full speed. Our simulated low power schemes applied to Processing System (PS) and Programmable Logic (PL) architecture effectively reduced the power consumption by 25.9% in the FIFO data transformation. We established that the increased LUT blocks affect the power-efficient rate and reduce the power consumption of the PL design up to 49%.\",\"PeriodicalId\":187022,\"journal\":{\"name\":\"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)\",\"volume\":\"166 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UEMCON47517.2019.8992929\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UEMCON47517.2019.8992929","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

摘要

本文展示了现有的低功耗寄存器传输电平(RTL)技术与传统技术相比,可以有效地作为基于cnn的目标识别系统加速的低功耗设计方案的可能性。大多数关于CNN加速的节能设计技术都集中在高级综合(High-level Synthesis, HLS)方面,如内存带宽优化、网络架构重构、数据重用和批处理规范化。然而,这些尝试本身的效力已经达到了极限。利用现场可编程门阵列(FPGA)制造商生成的合成后RTL代码,将提出的RTL低功耗设计技术应用于原始FIFO部分,以降低数据转换过程中的功耗。我们将HLS优化结果与RTL优化结果在功耗方面进行了比较。我们为改进后的FIFO模块配置了测试平台,并分析了估计的功耗结果。这些功率效率因素,如查找表(LUT)、查找表RAM (LUTRAM),可以分别降低54%和49%的功耗,尽管增加的块RAM (BRAM)会导致功耗提高154%。因此,总功耗能够降低10%。本文从功耗方面讨论了FPGA与片上系统(FPGA- soc)设计对基于cnn的硬件实现的两个因素,即RTL架构、存储器设计架构和基于模型架构的硬件实现方法。虚拟附加内存可以在全速下支持高吞吐量。我们的模拟低功耗方案应用于处理系统(PS)和可编程逻辑(PL)架构,在FIFO数据转换中有效地降低了25.9%的功耗。我们确定增加的LUT模块会影响电源效率,并将PL设计的功耗降低49%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Low Power FPGA-SoC Design Techniques for CNN-based Object Detection Accelerator
This paper shows the possibility of the existing low power register transfer level (RTL) techniques can be effective as a low power design scheme for CNN-based object recognition system acceleration in contrast to conventional techniques. Most power efficient design techniques regarding CNN acceleration are focused on the High-level Synthesis (HLS) aspect, such as memory bandwidth optimization, network architecture reconfiguration, data reuse, and batch normalization. However, these attempts have reached the limits of the effectiveness of itself. Using the post-synthesis RTL code generated by field-programmable gate arrays (FPGA) manufacturers, the proposed RTL low power design technique was applied to the original FIFO part for reducing the power consumption during data transformation. We compared the HLS optimized result with the RTL optimized result in the aspect of power consumption. We configured the testbench for the modified FIFO module and analyzed the estimated power dissipation result. These power effectiveness factors, such as a look-up table (LUT), a lookup table RAM (LUTRAM), can reduce the power dissipation by 54%, 49% respectively, even though increased block RAM (BRAM) leads to the elevated power dissipation by 154%. Thus, the total power consumption was able to be decreased by 10%. This paper discusses two factors of FPGA with system-on-chip (FPGA-SoC) design for CNN-based hardware implementation in power consumption aspect, such as RTL architecture, memory design architecture, and the model architecture-based hardware implementation methods. The virtual additional memory can support the high throughput at full speed. Our simulated low power schemes applied to Processing System (PS) and Programmable Logic (PL) architecture effectively reduced the power consumption by 25.9% in the FIFO data transformation. We established that the increased LUT blocks affect the power-efficient rate and reduce the power consumption of the PL design up to 49%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信