基于FPGA的稀疏MobileNet瓷砖分割数据流管道体系结构

2019 International Conference on Field-Programmable Technology (ICFPT) Pub Date : 2019-12-01 DOI:10.1109/ICFPT47387.2019.00044

Youki Sada, Masayuki Shimoda, Akira Jinguji, Hiroki Nakahara

{"title":"基于FPGA的稀疏MobileNet瓷砖分割数据流管道体系结构","authors":"Youki Sada, Masayuki Shimoda, Akira Jinguji, Hiroki Nakahara","doi":"10.1109/ICFPT47387.2019.00044","DOIUrl":null,"url":null,"abstract":"Implementation of fast semantic segmentation in an embedded system is necessary due to the increasing interest in automatic driving and energy-efficiency is a fundamental metric in such a scenario. Because high-resolution images are required to achieve high segmentation accuracy, its accelerator must prepare large buffers for the corresponding feature maps, and it is not suitable for the limited on-chip memory of an FPGA. To address this, we propose a tile segmentation algorithm and develop an FPGA-based accelerator for a sparse MobileNet-based PSPNet. To reduce the buffer size, the tile segmentation algorithm splits incoming images into the numbers determined by the stride on an FPGA. The PSPNet then performs many split tiles. Moreover, we propose a pipelined sparse convolutional circuit to compute multiple tiles with high-speed. We compared the proposed FPGA-based system with the NVIDIA RTX 2080 Ti using the Cityscapes benchmark. The FPGA achieved 139 FPS with 24 W power consumption for a 1024×512 image, and its accuracy (mIoU) was 64.2%. Compared with the GPU, it was 1.5 times faster, its power consumption was 9.3 times lower, and its performance per power consumption was 13.8 times better.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Dataflow Pipelining Architecture for Tile Segmentation with a Sparse MobileNet on an FPGA\",\"authors\":\"Youki Sada, Masayuki Shimoda, Akira Jinguji, Hiroki Nakahara\",\"doi\":\"10.1109/ICFPT47387.2019.00044\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Implementation of fast semantic segmentation in an embedded system is necessary due to the increasing interest in automatic driving and energy-efficiency is a fundamental metric in such a scenario. Because high-resolution images are required to achieve high segmentation accuracy, its accelerator must prepare large buffers for the corresponding feature maps, and it is not suitable for the limited on-chip memory of an FPGA. To address this, we propose a tile segmentation algorithm and develop an FPGA-based accelerator for a sparse MobileNet-based PSPNet. To reduce the buffer size, the tile segmentation algorithm splits incoming images into the numbers determined by the stride on an FPGA. The PSPNet then performs many split tiles. Moreover, we propose a pipelined sparse convolutional circuit to compute multiple tiles with high-speed. We compared the proposed FPGA-based system with the NVIDIA RTX 2080 Ti using the Cityscapes benchmark. The FPGA achieved 139 FPS with 24 W power consumption for a 1024×512 image, and its accuracy (mIoU) was 64.2%. Compared with the GPU, it was 1.5 times faster, its power consumption was 9.3 times lower, and its performance per power consumption was 13.8 times better.\",\"PeriodicalId\":241340,\"journal\":{\"name\":\"2019 International Conference on Field-Programmable Technology (ICFPT)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Field-Programmable Technology (ICFPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICFPT47387.2019.00044\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT47387.2019.00044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在嵌入式系统中实现快速语义分割是必要的，因为人们对自动驾驶越来越感兴趣，而能源效率是这种情况下的基本指标。由于高分辨率图像需要达到较高的分割精度，因此其加速器必须为相应的特征映射准备较大的缓冲区，并且不适合FPGA有限的片上存储器。为了解决这个问题，我们提出了一种瓷砖分割算法，并为基于mobilenet的稀疏PSPNet开发了基于fpga的加速器。为了减小缓冲区大小，tile分割算法将传入的图像分割成由FPGA上的步幅决定的数字。然后PSPNet执行许多分割块。此外，我们还提出了一种流水线稀疏卷积电路，可以高速计算多个块。我们使用cityscape基准将提出的基于fpga的系统与NVIDIA RTX 2080 Ti进行了比较。FPGA在一张1024×512图像的功耗为24 W的情况下实现了139 FPS，精度(mIoU)为64.2%。与GPU相比，速度提高1.5倍，功耗降低9.3倍，单位功耗性能提高13.8倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Dataflow Pipelining Architecture for Tile Segmentation with a Sparse MobileNet on an FPGA

Implementation of fast semantic segmentation in an embedded system is necessary due to the increasing interest in automatic driving and energy-efficiency is a fundamental metric in such a scenario. Because high-resolution images are required to achieve high segmentation accuracy, its accelerator must prepare large buffers for the corresponding feature maps, and it is not suitable for the limited on-chip memory of an FPGA. To address this, we propose a tile segmentation algorithm and develop an FPGA-based accelerator for a sparse MobileNet-based PSPNet. To reduce the buffer size, the tile segmentation algorithm splits incoming images into the numbers determined by the stride on an FPGA. The PSPNet then performs many split tiles. Moreover, we propose a pipelined sparse convolutional circuit to compute multiple tiles with high-speed. We compared the proposed FPGA-based system with the NVIDIA RTX 2080 Ti using the Cityscapes benchmark. The FPGA achieved 139 FPS with 24 W power consumption for a 1024×512 image, and its accuracy (mIoU) was 64.2%. Compared with the GPU, it was 1.5 times faster, its power consumption was 9.3 times lower, and its performance per power consumption was 13.8 times better.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 International Conference on Field-Programmable Technology (ICFPT)

自引率

0.00%

发文量