FDLNet: Boosting Real-time Semantic Segmentation by Image-size Convolution via Frequency Domain Learning

2023 IEEE International Conference on Robotics and Automation (ICRA) Pub Date : 2023-05-29 DOI:10.1109/ICRA48891.2023.10161421

Qingqing Yan, Shu Li, Chengju Liu, Meilin Liu, Qi Chen

{"title":"FDLNet: Boosting Real-time Semantic Segmentation by Image-size Convolution via Frequency Domain Learning","authors":"Qingqing Yan, Shu Li, Chengju Liu, Meilin Liu, Qi Chen","doi":"10.1109/ICRA48891.2023.10161421","DOIUrl":null,"url":null,"abstract":"This paper proposes a novel real-time semantic segmentation network via frequency domain learning, called FDLNet, which revisits the segmentation task from two critical perspectives: spatial structure description and multilevel feature fusion. We first devise an image-size convolution (IS-Conv) as a global frequency-domain learning operator to capture long-range dependency in a single shot. To model spatial structure information, we construct the global structure representation path (GSRP) based on IS-Conv, which learns a unified edge-region representation with affordable complexity. For efficient and lightweight multi-level feature fusion, we propose the factorized stereoscopic attention (FSA) module, which alleviates semantic confusion and reduces feature redundancy by introducing level-wise attention before channel and spatial attention. Combining the above modules, we propose a concise semantic segmentation framework named FDLNet. We experimentally demonstrate the effectiveness and superiority of the proposed method. FDLNet achieves state-of-the-art performance on the Cityscapes, which reports 76.32% mIoU at 150+ FPS and 79.0% mIoU at 41+ FPS. The code is available at https://github.com/qyan0131/FDLNet.","PeriodicalId":360533,"journal":{"name":"2023 IEEE International Conference on Robotics and Automation (ICRA)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Robotics and Automation (ICRA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRA48891.2023.10161421","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper proposes a novel real-time semantic segmentation network via frequency domain learning, called FDLNet, which revisits the segmentation task from two critical perspectives: spatial structure description and multilevel feature fusion. We first devise an image-size convolution (IS-Conv) as a global frequency-domain learning operator to capture long-range dependency in a single shot. To model spatial structure information, we construct the global structure representation path (GSRP) based on IS-Conv, which learns a unified edge-region representation with affordable complexity. For efficient and lightweight multi-level feature fusion, we propose the factorized stereoscopic attention (FSA) module, which alleviates semantic confusion and reduces feature redundancy by introducing level-wise attention before channel and spatial attention. Combining the above modules, we propose a concise semantic segmentation framework named FDLNet. We experimentally demonstrate the effectiveness and superiority of the proposed method. FDLNet achieves state-of-the-art performance on the Cityscapes, which reports 76.32% mIoU at 150+ FPS and 79.0% mIoU at 41+ FPS. The code is available at https://github.com/qyan0131/FDLNet.

查看原文本刊更多论文

FDLNet:基于频域学习的图像大小卷积增强实时语义分割

本文提出了一种基于频域学习的实时语义分割网络FDLNet，该网络从空间结构描述和多层次特征融合两个关键角度重新审视了语义分割任务。我们首先设计了一个图像大小的卷积(IS-Conv)作为一个全局频域学习算子，以捕获单个镜头中的远程依赖关系。为了对空间结构信息进行建模，我们构建了基于IS-Conv的全局结构表示路径(GSRP)，该路径学习了一个统一的边缘区域表示，且复杂度可承受。为了实现高效、轻量级的多层次特征融合，我们提出了分解立体注意模块(factorized stereoscopic attention, FSA)，该模块通过在通道和空间注意之前引入分层注意来缓解语义混淆和减少特征冗余。结合以上模块，我们提出了一个简洁的语义分割框架FDLNet。实验证明了该方法的有效性和优越性。FDLNet在cityscape上实现了最先进的性能，在150+ FPS下mIoU为76.32%，在41+ FPS下mIoU为79.0%。代码可在https://github.com/qyan0131/FDLNet上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE International Conference on Robotics and Automation (ICRA)

自引率

0.00%

发文量