{"title":"基于FPGA的稀疏MobileNet瓷砖分割数据流管道体系结构","authors":"Youki Sada, Masayuki Shimoda, Akira Jinguji, Hiroki Nakahara","doi":"10.1109/ICFPT47387.2019.00044","DOIUrl":null,"url":null,"abstract":"Implementation of fast semantic segmentation in an embedded system is necessary due to the increasing interest in automatic driving and energy-efficiency is a fundamental metric in such a scenario. Because high-resolution images are required to achieve high segmentation accuracy, its accelerator must prepare large buffers for the corresponding feature maps, and it is not suitable for the limited on-chip memory of an FPGA. To address this, we propose a tile segmentation algorithm and develop an FPGA-based accelerator for a sparse MobileNet-based PSPNet. To reduce the buffer size, the tile segmentation algorithm splits incoming images into the numbers determined by the stride on an FPGA. The PSPNet then performs many split tiles. Moreover, we propose a pipelined sparse convolutional circuit to compute multiple tiles with high-speed. We compared the proposed FPGA-based system with the NVIDIA RTX 2080 Ti using the Cityscapes benchmark. The FPGA achieved 139 FPS with 24 W power consumption for a 1024×512 image, and its accuracy (mIoU) was 64.2%. Compared with the GPU, it was 1.5 times faster, its power consumption was 9.3 times lower, and its performance per power consumption was 13.8 times better.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Dataflow Pipelining Architecture for Tile Segmentation with a Sparse MobileNet on an FPGA\",\"authors\":\"Youki Sada, Masayuki Shimoda, Akira Jinguji, Hiroki Nakahara\",\"doi\":\"10.1109/ICFPT47387.2019.00044\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Implementation of fast semantic segmentation in an embedded system is necessary due to the increasing interest in automatic driving and energy-efficiency is a fundamental metric in such a scenario. Because high-resolution images are required to achieve high segmentation accuracy, its accelerator must prepare large buffers for the corresponding feature maps, and it is not suitable for the limited on-chip memory of an FPGA. To address this, we propose a tile segmentation algorithm and develop an FPGA-based accelerator for a sparse MobileNet-based PSPNet. To reduce the buffer size, the tile segmentation algorithm splits incoming images into the numbers determined by the stride on an FPGA. The PSPNet then performs many split tiles. Moreover, we propose a pipelined sparse convolutional circuit to compute multiple tiles with high-speed. We compared the proposed FPGA-based system with the NVIDIA RTX 2080 Ti using the Cityscapes benchmark. The FPGA achieved 139 FPS with 24 W power consumption for a 1024×512 image, and its accuracy (mIoU) was 64.2%. Compared with the GPU, it was 1.5 times faster, its power consumption was 9.3 times lower, and its performance per power consumption was 13.8 times better.\",\"PeriodicalId\":241340,\"journal\":{\"name\":\"2019 International Conference on Field-Programmable Technology (ICFPT)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Field-Programmable Technology (ICFPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICFPT47387.2019.00044\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT47387.2019.00044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Dataflow Pipelining Architecture for Tile Segmentation with a Sparse MobileNet on an FPGA
Implementation of fast semantic segmentation in an embedded system is necessary due to the increasing interest in automatic driving and energy-efficiency is a fundamental metric in such a scenario. Because high-resolution images are required to achieve high segmentation accuracy, its accelerator must prepare large buffers for the corresponding feature maps, and it is not suitable for the limited on-chip memory of an FPGA. To address this, we propose a tile segmentation algorithm and develop an FPGA-based accelerator for a sparse MobileNet-based PSPNet. To reduce the buffer size, the tile segmentation algorithm splits incoming images into the numbers determined by the stride on an FPGA. The PSPNet then performs many split tiles. Moreover, we propose a pipelined sparse convolutional circuit to compute multiple tiles with high-speed. We compared the proposed FPGA-based system with the NVIDIA RTX 2080 Ti using the Cityscapes benchmark. The FPGA achieved 139 FPS with 24 W power consumption for a 1024×512 image, and its accuracy (mIoU) was 64.2%. Compared with the GPU, it was 1.5 times faster, its power consumption was 9.3 times lower, and its performance per power consumption was 13.8 times better.