Oscar Rahnama, Tommaso Cavallari, S. Golodetz, Simon Walker, Philip H. S. Torr
{"title":"R3SGM: Real-Time Raster-Respecting Semi-Global Matching for Power-Constrained Systems","authors":"Oscar Rahnama, Tommaso Cavallari, S. Golodetz, Simon Walker, Philip H. S. Torr","doi":"10.1109/FPT.2018.00025","DOIUrl":"https://doi.org/10.1109/FPT.2018.00025","url":null,"abstract":"Stereo depth estimation is used for many computer vision applications. Though many popular methods strive solely for depth quality, for real-time mobile applications (e.g. prosthetic glasses or micro-UAVs), speed and power efficiency are equally, if not more, important. Many real-world systems rely on Semi-Global Matching (SGM) to achieve a good accuracy vs. speed balance, but power efficiency is hard to achieve with conventional hardware, making the use of embedded devices such as FPGAs attractive for low-power applications. However, the full SGM algorithm is ill-suited to deployment on FPGAs, and so most FPGA variants of it are partial, at the expense of accuracy. In a non-FPGA context, the accuracy of SGM has been improved by More Global Matching (MGM), which also helps tackle the streaking artifacts that afflict SGM. In this paper, we propose a novel, resource-efficient method that is inspired by MGM's techniques for improving depth quality, but which can be implemented to run in real time on a low-power FPGA. Through evaluation on multiple datasets (KITTI and Middlebury), we show that in comparison to other real-time capable stereo approaches, we can achieve a state-of-the-art balance between accuracy, power efficiency and speed, making our approach highly desirable for use in real-time systems with limited power.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124024608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Scalable Pipelined Dataflow Accelerator for Object Region Proposals on FPGA Platform","authors":"Wenzhi Fu, Jianlei Yang, Pengcheng Dai, Yiran Chen, Weisheng Zhao","doi":"10.1109/FPT.2018.00070","DOIUrl":"https://doi.org/10.1109/FPT.2018.00070","url":null,"abstract":"Region proposal is critical for object detection while it usually poses a bottleneck in improving the computation efficiency on traditional control-flow architectures. We have observed region proposal tasks are potentially suitable for performing pipelined parallelism by exploiting dataflow driven acceleration. In this paper, a scalable pipelined dataflow accelerator is proposed for efficient region proposals on FPGA platform. The accelerator processes image data by a streaming manner with three sequential stages: resizing, kernel computing and sorting. First, Ping-Pong cache strategy is adopted for rotation loading in resize module to guarantee continuous output streaming. Then, a multiple pipelines architecture with tiered memory is utilized in kernel computing module to complete the main computation tasks. Finally, a bubble-pushing heap sort method is exploited in sorting module to find the top-k largest candidates efficiently. Our design is implemented with high level synthesis on FPGA platforms, and experimental re-sults on VOC2007 datasets show that it could achieve about 3.67X speedups than traditional desktop CPU platform and >250X energy efficiency improvement than embedded ARM platform.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122428743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Live Migration for OpenCL FPGA Accelerators","authors":"Anuj Vaishnav, K. Pham, Dirk Koch","doi":"10.1109/FPT.2018.00017","DOIUrl":"https://doi.org/10.1109/FPT.2018.00017","url":null,"abstract":"FPGAs are currently being deployed at a large scale across data-centres for various applications because of their performance and power benefits. In particular, cloud service operators are now offering FPGAs as a Service. However, to completely integrate FPGAs in a data-centre environment like standard software systems, support for fault tolerance and task migration is essential. In this paper, we propose a live migration technique for FPGA accelerators to provide support for fault tolerance, system maintenance, and resource management. Our technique allows migration of OpenCL accelerators not only within a single FPGA but also across FPGAs with zero downtime. It achieves this by overlapping the computation with datamovements transparently from the user for OpenCL kernels. Moreover, distributed check-pointing mechanisms can be employed to recover from unknown faults with minimal loss of completed work. Altogether it enables system updates such as changing the static FPGA configuration or upgrading the OS without an interruption of service.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130145119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large Utility Sorting on FPGAs","authors":"Kristiyan Manev, Dirk Koch","doi":"10.1109/FPT.2018.00067","DOIUrl":"https://doi.org/10.1109/FPT.2018.00067","url":null,"abstract":"This paper presents a merge sorter able of merging thousands of streams in a single run where the logic cost scales logarithmic with the number of streams merged. Moreover, we apply several performance tuning techniques, including speculative execution, deep pipelining and optimized communication schemes between processing elements. An end-to-end case study utilizing a Xilinx VC709 board merges 2048 sequences of the Graysort benchmark between two DRAMs at 9.5GB/s or 1024 sequences at 10.3GB/s effective throughput.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115127977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}