基于cnn的目标检测加速器低功耗FPGA-SoC设计技术

2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON) Pub Date : 2019-10-01 DOI:10.1109/UEMCON47517.2019.8992929

Heekyung Kim, K. Choi

{"title":"基于cnn的目标检测加速器低功耗FPGA-SoC设计技术","authors":"Heekyung Kim, K. Choi","doi":"10.1109/UEMCON47517.2019.8992929","DOIUrl":null,"url":null,"abstract":"This paper shows the possibility of the existing low power register transfer level (RTL) techniques can be effective as a low power design scheme for CNN-based object recognition system acceleration in contrast to conventional techniques. Most power efficient design techniques regarding CNN acceleration are focused on the High-level Synthesis (HLS) aspect, such as memory bandwidth optimization, network architecture reconfiguration, data reuse, and batch normalization. However, these attempts have reached the limits of the effectiveness of itself. Using the post-synthesis RTL code generated by field-programmable gate arrays (FPGA) manufacturers, the proposed RTL low power design technique was applied to the original FIFO part for reducing the power consumption during data transformation. We compared the HLS optimized result with the RTL optimized result in the aspect of power consumption. We configured the testbench for the modified FIFO module and analyzed the estimated power dissipation result. These power effectiveness factors, such as a look-up table (LUT), a lookup table RAM (LUTRAM), can reduce the power dissipation by 54%, 49% respectively, even though increased block RAM (BRAM) leads to the elevated power dissipation by 154%. Thus, the total power consumption was able to be decreased by 10%. This paper discusses two factors of FPGA with system-on-chip (FPGA-SoC) design for CNN-based hardware implementation in power consumption aspect, such as RTL architecture, memory design architecture, and the model architecture-based hardware implementation methods. The virtual additional memory can support the high throughput at full speed. Our simulated low power schemes applied to Processing System (PS) and Programmable Logic (PL) architecture effectively reduced the power consumption by 25.9% in the FIFO data transformation. We established that the increased LUT blocks affect the power-efficient rate and reduce the power consumption of the PL design up to 49%.","PeriodicalId":187022,"journal":{"name":"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","volume":"166 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Low Power FPGA-SoC Design Techniques for CNN-based Object Detection Accelerator\",\"authors\":\"Heekyung Kim, K. Choi\",\"doi\":\"10.1109/UEMCON47517.2019.8992929\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper shows the possibility of the existing low power register transfer level (RTL) techniques can be effective as a low power design scheme for CNN-based object recognition system acceleration in contrast to conventional techniques. Most power efficient design techniques regarding CNN acceleration are focused on the High-level Synthesis (HLS) aspect, such as memory bandwidth optimization, network architecture reconfiguration, data reuse, and batch normalization. However, these attempts have reached the limits of the effectiveness of itself. Using the post-synthesis RTL code generated by field-programmable gate arrays (FPGA) manufacturers, the proposed RTL low power design technique was applied to the original FIFO part for reducing the power consumption during data transformation. We compared the HLS optimized result with the RTL optimized result in the aspect of power consumption. We configured the testbench for the modified FIFO module and analyzed the estimated power dissipation result. These power effectiveness factors, such as a look-up table (LUT), a lookup table RAM (LUTRAM), can reduce the power dissipation by 54%, 49% respectively, even though increased block RAM (BRAM) leads to the elevated power dissipation by 154%. Thus, the total power consumption was able to be decreased by 10%. This paper discusses two factors of FPGA with system-on-chip (FPGA-SoC) design for CNN-based hardware implementation in power consumption aspect, such as RTL architecture, memory design architecture, and the model architecture-based hardware implementation methods. The virtual additional memory can support the high throughput at full speed. Our simulated low power schemes applied to Processing System (PS) and Programmable Logic (PL) architecture effectively reduced the power consumption by 25.9% in the FIFO data transformation. We established that the increased LUT blocks affect the power-efficient rate and reduce the power consumption of the PL design up to 49%.\",\"PeriodicalId\":187022,\"journal\":{\"name\":\"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)\",\"volume\":\"166 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UEMCON47517.2019.8992929\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UEMCON47517.2019.8992929","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

本文展示了现有的低功耗寄存器传输电平(RTL)技术与传统技术相比，可以有效地作为基于cnn的目标识别系统加速的低功耗设计方案的可能性。大多数关于CNN加速的节能设计技术都集中在高级综合(High-level Synthesis, HLS)方面，如内存带宽优化、网络架构重构、数据重用和批处理规范化。然而，这些尝试本身的效力已经达到了极限。利用现场可编程门阵列(FPGA)制造商生成的合成后RTL代码，将提出的RTL低功耗设计技术应用于原始FIFO部分，以降低数据转换过程中的功耗。我们将HLS优化结果与RTL优化结果在功耗方面进行了比较。我们为改进后的FIFO模块配置了测试平台，并分析了估计的功耗结果。这些功率效率因素，如查找表(LUT)、查找表RAM (LUTRAM)，可以分别降低54%和49%的功耗，尽管增加的块RAM (BRAM)会导致功耗提高154%。因此，总功耗能够降低10%。本文从功耗方面讨论了FPGA与片上系统(FPGA- soc)设计对基于cnn的硬件实现的两个因素，即RTL架构、存储器设计架构和基于模型架构的硬件实现方法。虚拟附加内存可以在全速下支持高吞吐量。我们的模拟低功耗方案应用于处理系统(PS)和可编程逻辑(PL)架构，在FIFO数据转换中有效地降低了25.9%的功耗。我们确定增加的LUT模块会影响电源效率，并将PL设计的功耗降低49%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Low Power FPGA-SoC Design Techniques for CNN-based Object Detection Accelerator

This paper shows the possibility of the existing low power register transfer level (RTL) techniques can be effective as a low power design scheme for CNN-based object recognition system acceleration in contrast to conventional techniques. Most power efficient design techniques regarding CNN acceleration are focused on the High-level Synthesis (HLS) aspect, such as memory bandwidth optimization, network architecture reconfiguration, data reuse, and batch normalization. However, these attempts have reached the limits of the effectiveness of itself. Using the post-synthesis RTL code generated by field-programmable gate arrays (FPGA) manufacturers, the proposed RTL low power design technique was applied to the original FIFO part for reducing the power consumption during data transformation. We compared the HLS optimized result with the RTL optimized result in the aspect of power consumption. We configured the testbench for the modified FIFO module and analyzed the estimated power dissipation result. These power effectiveness factors, such as a look-up table (LUT), a lookup table RAM (LUTRAM), can reduce the power dissipation by 54%, 49% respectively, even though increased block RAM (BRAM) leads to the elevated power dissipation by 154%. Thus, the total power consumption was able to be decreased by 10%. This paper discusses two factors of FPGA with system-on-chip (FPGA-SoC) design for CNN-based hardware implementation in power consumption aspect, such as RTL architecture, memory design architecture, and the model architecture-based hardware implementation methods. The virtual additional memory can support the high throughput at full speed. Our simulated low power schemes applied to Processing System (PS) and Programmable Logic (PL) architecture effectively reduced the power consumption by 25.9% in the FIFO data transformation. We established that the increased LUT blocks affect the power-efficient rate and reduce the power consumption of the PL design up to 49%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)

自引率

0.00%

发文量