基于FPGA的稀疏MobileNet瓷砖分割数据流管道体系结构

Youki Sada, Masayuki Shimoda, Akira Jinguji, Hiroki Nakahara
{"title":"基于FPGA的稀疏MobileNet瓷砖分割数据流管道体系结构","authors":"Youki Sada, Masayuki Shimoda, Akira Jinguji, Hiroki Nakahara","doi":"10.1109/ICFPT47387.2019.00044","DOIUrl":null,"url":null,"abstract":"Implementation of fast semantic segmentation in an embedded system is necessary due to the increasing interest in automatic driving and energy-efficiency is a fundamental metric in such a scenario. Because high-resolution images are required to achieve high segmentation accuracy, its accelerator must prepare large buffers for the corresponding feature maps, and it is not suitable for the limited on-chip memory of an FPGA. To address this, we propose a tile segmentation algorithm and develop an FPGA-based accelerator for a sparse MobileNet-based PSPNet. To reduce the buffer size, the tile segmentation algorithm splits incoming images into the numbers determined by the stride on an FPGA. The PSPNet then performs many split tiles. Moreover, we propose a pipelined sparse convolutional circuit to compute multiple tiles with high-speed. We compared the proposed FPGA-based system with the NVIDIA RTX 2080 Ti using the Cityscapes benchmark. The FPGA achieved 139 FPS with 24 W power consumption for a 1024×512 image, and its accuracy (mIoU) was 64.2%. Compared with the GPU, it was 1.5 times faster, its power consumption was 9.3 times lower, and its performance per power consumption was 13.8 times better.","PeriodicalId":241340,"journal":{"name":"2019 International Conference on Field-Programmable Technology (ICFPT)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Dataflow Pipelining Architecture for Tile Segmentation with a Sparse MobileNet on an FPGA\",\"authors\":\"Youki Sada, Masayuki Shimoda, Akira Jinguji, Hiroki Nakahara\",\"doi\":\"10.1109/ICFPT47387.2019.00044\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Implementation of fast semantic segmentation in an embedded system is necessary due to the increasing interest in automatic driving and energy-efficiency is a fundamental metric in such a scenario. Because high-resolution images are required to achieve high segmentation accuracy, its accelerator must prepare large buffers for the corresponding feature maps, and it is not suitable for the limited on-chip memory of an FPGA. To address this, we propose a tile segmentation algorithm and develop an FPGA-based accelerator for a sparse MobileNet-based PSPNet. To reduce the buffer size, the tile segmentation algorithm splits incoming images into the numbers determined by the stride on an FPGA. The PSPNet then performs many split tiles. Moreover, we propose a pipelined sparse convolutional circuit to compute multiple tiles with high-speed. We compared the proposed FPGA-based system with the NVIDIA RTX 2080 Ti using the Cityscapes benchmark. The FPGA achieved 139 FPS with 24 W power consumption for a 1024×512 image, and its accuracy (mIoU) was 64.2%. Compared with the GPU, it was 1.5 times faster, its power consumption was 9.3 times lower, and its performance per power consumption was 13.8 times better.\",\"PeriodicalId\":241340,\"journal\":{\"name\":\"2019 International Conference on Field-Programmable Technology (ICFPT)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Field-Programmable Technology (ICFPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICFPT47387.2019.00044\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT47387.2019.00044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在嵌入式系统中实现快速语义分割是必要的,因为人们对自动驾驶越来越感兴趣,而能源效率是这种情况下的基本指标。由于高分辨率图像需要达到较高的分割精度,因此其加速器必须为相应的特征映射准备较大的缓冲区,并且不适合FPGA有限的片上存储器。为了解决这个问题,我们提出了一种瓷砖分割算法,并为基于mobilenet的稀疏PSPNet开发了基于fpga的加速器。为了减小缓冲区大小,tile分割算法将传入的图像分割成由FPGA上的步幅决定的数字。然后PSPNet执行许多分割块。此外,我们还提出了一种流水线稀疏卷积电路,可以高速计算多个块。我们使用cityscape基准将提出的基于fpga的系统与NVIDIA RTX 2080 Ti进行了比较。FPGA在一张1024×512图像的功耗为24 W的情况下实现了139 FPS,精度(mIoU)为64.2%。与GPU相比,速度提高1.5倍,功耗降低9.3倍,单位功耗性能提高13.8倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Dataflow Pipelining Architecture for Tile Segmentation with a Sparse MobileNet on an FPGA
Implementation of fast semantic segmentation in an embedded system is necessary due to the increasing interest in automatic driving and energy-efficiency is a fundamental metric in such a scenario. Because high-resolution images are required to achieve high segmentation accuracy, its accelerator must prepare large buffers for the corresponding feature maps, and it is not suitable for the limited on-chip memory of an FPGA. To address this, we propose a tile segmentation algorithm and develop an FPGA-based accelerator for a sparse MobileNet-based PSPNet. To reduce the buffer size, the tile segmentation algorithm splits incoming images into the numbers determined by the stride on an FPGA. The PSPNet then performs many split tiles. Moreover, we propose a pipelined sparse convolutional circuit to compute multiple tiles with high-speed. We compared the proposed FPGA-based system with the NVIDIA RTX 2080 Ti using the Cityscapes benchmark. The FPGA achieved 139 FPS with 24 W power consumption for a 1024×512 image, and its accuracy (mIoU) was 64.2%. Compared with the GPU, it was 1.5 times faster, its power consumption was 9.3 times lower, and its performance per power consumption was 13.8 times better.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信