基于fpga的大规模信号二维归一化互相关的实现

2021 IEEE 6th International Forum on Research and Technology for Society and Industry (RTSI) Pub Date : 2021-09-06 DOI:10.1109/rtsi50628.2021.9597359

Mirko Salaris, Andrea Damiani, Edoardo Putti, Luca Stornaiuolo

{"title":"基于fpga的大规模信号二维归一化互相关的实现","authors":"Mirko Salaris, Andrea Damiani, Edoardo Putti, Luca Stornaiuolo","doi":"10.1109/rtsi50628.2021.9597359","DOIUrl":null,"url":null,"abstract":"About every three years, the high-end image resolution quadruples: what we called high-resolution in 2018 is turning standard now. Even simpler embedded devices can shoot videos at 4K resolution. The combination of this rush to wider frames with the blooming era of Computer Vision (CV) constantly pushes the performance requirements of the underlying processing. Template matching is one of the CV's foundations as it enables the localization of objects inside images. It exploits similarity functions such as the 2D Cross-Correlation and its variants Normalized Cross-Correlation (NCC) and Zero-mean NCC (ZNCC). However, these computations do not scale gracefully with resolution. We propose two novel FPGA-based implementations with low hardware resource consumption for the 2D NCC and ZNCC for large-scale images. We succeeded in fitting our accelerator on the 3CG class of Xilinx Zynq UltraScale+ ARM-based MPSoCs, among the smallest embedded-grade classes that do not even include dedicated CV hardware, which adds a 14% cost overhead, thus enabling accelerated template matching for local preprocessing in IoT applications. We achieve this while attaining a $\\boldsymbol{3.52}\\times$ speedup over non-embedded systems and remaining $\\mathbf{43.2}\\times$ more power efficient for NCC. Finally, to fully exploit the heterogeneous nature of our target hardware, we provide a runtime hardware selection algorithm to automatically target the proper hardware/software implementation for best performance.","PeriodicalId":294628,"journal":{"name":"2021 IEEE 6th International Forum on Research and Technology for Society and Industry (RTSI)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FPGA-based implementation of 2D Normalized Cross-Correlation for Large Scale Signals\",\"authors\":\"Mirko Salaris, Andrea Damiani, Edoardo Putti, Luca Stornaiuolo\",\"doi\":\"10.1109/rtsi50628.2021.9597359\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"About every three years, the high-end image resolution quadruples: what we called high-resolution in 2018 is turning standard now. Even simpler embedded devices can shoot videos at 4K resolution. The combination of this rush to wider frames with the blooming era of Computer Vision (CV) constantly pushes the performance requirements of the underlying processing. Template matching is one of the CV's foundations as it enables the localization of objects inside images. It exploits similarity functions such as the 2D Cross-Correlation and its variants Normalized Cross-Correlation (NCC) and Zero-mean NCC (ZNCC). However, these computations do not scale gracefully with resolution. We propose two novel FPGA-based implementations with low hardware resource consumption for the 2D NCC and ZNCC for large-scale images. We succeeded in fitting our accelerator on the 3CG class of Xilinx Zynq UltraScale+ ARM-based MPSoCs, among the smallest embedded-grade classes that do not even include dedicated CV hardware, which adds a 14% cost overhead, thus enabling accelerated template matching for local preprocessing in IoT applications. We achieve this while attaining a $\\\\boldsymbol{3.52}\\\\times$ speedup over non-embedded systems and remaining $\\\\mathbf{43.2}\\\\times$ more power efficient for NCC. Finally, to fully exploit the heterogeneous nature of our target hardware, we provide a runtime hardware selection algorithm to automatically target the proper hardware/software implementation for best performance.\",\"PeriodicalId\":294628,\"journal\":{\"name\":\"2021 IEEE 6th International Forum on Research and Technology for Society and Industry (RTSI)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 6th International Forum on Research and Technology for Society and Industry (RTSI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/rtsi50628.2021.9597359\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 6th International Forum on Research and Technology for Society and Industry (RTSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/rtsi50628.2021.9597359","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

大约每三年，高端图像分辨率就会翻两番:2018年我们所说的高分辨率现在正在变成标准。甚至更简单的嵌入式设备也可以拍摄4K分辨率的视频。这种对更宽帧的渴求与计算机视觉(CV)蓬勃发展的时代相结合，不断推动底层处理的性能要求。模板匹配是CV的基础之一，因为它可以实现图像内对象的定位。它利用相似函数，如二维相互关联及其变体归一化相互关联(NCC)和零均值相互关联(ZNCC)。然而，这些计算不能随着分辨率的增加而优雅地扩展。我们提出了两种新的基于fpga的低硬件资源消耗的2D NCC和大规模图像的ZNCC实现。我们成功地将我们的加速器安装在Xilinx Zynq UltraScale+ arm的3CG级mpsoc上，这是最小的嵌入式级mpsoc之一，甚至不包括专用的CV硬件，这增加了14%的成本开销，从而加速了物联网应用中本地预处理的模板匹配。我们在实现这一目标的同时，在非嵌入式系统上获得了$\boldsymbol{3.52}\times$的加速，并在NCC上获得了$\mathbf{43.2}\times$的能效。最后，为了充分利用目标硬件的异构特性，我们提供了一个运行时硬件选择算法，以自动针对最佳性能的适当硬件/软件实现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

FPGA-based implementation of 2D Normalized Cross-Correlation for Large Scale Signals

About every three years, the high-end image resolution quadruples: what we called high-resolution in 2018 is turning standard now. Even simpler embedded devices can shoot videos at 4K resolution. The combination of this rush to wider frames with the blooming era of Computer Vision (CV) constantly pushes the performance requirements of the underlying processing. Template matching is one of the CV's foundations as it enables the localization of objects inside images. It exploits similarity functions such as the 2D Cross-Correlation and its variants Normalized Cross-Correlation (NCC) and Zero-mean NCC (ZNCC). However, these computations do not scale gracefully with resolution. We propose two novel FPGA-based implementations with low hardware resource consumption for the 2D NCC and ZNCC for large-scale images. We succeeded in fitting our accelerator on the 3CG class of Xilinx Zynq UltraScale+ ARM-based MPSoCs, among the smallest embedded-grade classes that do not even include dedicated CV hardware, which adds a 14% cost overhead, thus enabling accelerated template matching for local preprocessing in IoT applications. We achieve this while attaining a $\boldsymbol{3.52}\times$ speedup over non-embedded systems and remaining $\mathbf{43.2}\times$ more power efficient for NCC. Finally, to fully exploit the heterogeneous nature of our target hardware, we provide a runtime hardware selection algorithm to automatically target the proper hardware/software implementation for best performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE 6th International Forum on Research and Technology for Society and Industry (RTSI)

自引率

0.00%

发文量