Yingxiang Li, Yingke Gao, Zhiwen Su, Shi-tao Chen, Longjun Liu
{"title":"FPGA加速光流的实时循环全对场变换","authors":"Yingxiang Li, Yingke Gao, Zhiwen Su, Shi-tao Chen, Longjun Liu","doi":"10.1109/CAC57257.2022.10054761","DOIUrl":null,"url":null,"abstract":"Optical flow algorithms based on deep learning have achieved excellent performance on multiple datasets, bringing new opportunity for optical flow estimation. Recurrent All-Pairs Field Transforms (RAFT) is one of the most powerful deep network based optical flow algorithms, but it is difficult to process in real time on the resource-limited embedded platform. In this paper, we propose RAFT-Lite by compressing the original RAFT model, which is more lightweight and suitable for hardware deployment. We further propose a hardware accelerating architecture on FPGA for RAFT-Lite, which provides an efficient scheduling strategy for the convolution in RAFT to achieve efficient pipeline and resource reuse. On Xilinx ZCU102 evaluation board, the accelerated hardware system can reach 10.4fps processing images with a resolution of 512*396, which is 6.8x of i7-10700@2.90GHz and 46x of ARM Cortex-A53@1.50GHz. Besides, the power consumption is 13.103W.","PeriodicalId":287137,"journal":{"name":"2022 China Automation Congress (CAC)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"FPGA Accelerated Real-time Recurrent All-Pairs Field Transforms for Optical Flow\",\"authors\":\"Yingxiang Li, Yingke Gao, Zhiwen Su, Shi-tao Chen, Longjun Liu\",\"doi\":\"10.1109/CAC57257.2022.10054761\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Optical flow algorithms based on deep learning have achieved excellent performance on multiple datasets, bringing new opportunity for optical flow estimation. Recurrent All-Pairs Field Transforms (RAFT) is one of the most powerful deep network based optical flow algorithms, but it is difficult to process in real time on the resource-limited embedded platform. In this paper, we propose RAFT-Lite by compressing the original RAFT model, which is more lightweight and suitable for hardware deployment. We further propose a hardware accelerating architecture on FPGA for RAFT-Lite, which provides an efficient scheduling strategy for the convolution in RAFT to achieve efficient pipeline and resource reuse. On Xilinx ZCU102 evaluation board, the accelerated hardware system can reach 10.4fps processing images with a resolution of 512*396, which is 6.8x of i7-10700@2.90GHz and 46x of ARM Cortex-A53@1.50GHz. Besides, the power consumption is 13.103W.\",\"PeriodicalId\":287137,\"journal\":{\"name\":\"2022 China Automation Congress (CAC)\",\"volume\":\"96 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 China Automation Congress (CAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CAC57257.2022.10054761\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 China Automation Congress (CAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CAC57257.2022.10054761","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
FPGA Accelerated Real-time Recurrent All-Pairs Field Transforms for Optical Flow
Optical flow algorithms based on deep learning have achieved excellent performance on multiple datasets, bringing new opportunity for optical flow estimation. Recurrent All-Pairs Field Transforms (RAFT) is one of the most powerful deep network based optical flow algorithms, but it is difficult to process in real time on the resource-limited embedded platform. In this paper, we propose RAFT-Lite by compressing the original RAFT model, which is more lightweight and suitable for hardware deployment. We further propose a hardware accelerating architecture on FPGA for RAFT-Lite, which provides an efficient scheduling strategy for the convolution in RAFT to achieve efficient pipeline and resource reuse. On Xilinx ZCU102 evaluation board, the accelerated hardware system can reach 10.4fps processing images with a resolution of 512*396, which is 6.8x of i7-10700@2.90GHz and 46x of ARM Cortex-A53@1.50GHz. Besides, the power consumption is 13.103W.