LiteMSNet：针对城市街景场景的多尺度特征提取轻量级语义分割网络

The Visual Computer Pub Date : 2024-07-22 DOI:10.1007/s00371-024-03569-y

Lirong Li, Jiang Ding, Hao Cui, Zhiqiang Chen, Guisheng Liao

{"title":"LiteMSNet：针对城市街景场景的多尺度特征提取轻量级语义分割网络","authors":"Lirong Li, Jiang Ding, Hao Cui, Zhiqiang Chen, Guisheng Liao","doi":"10.1007/s00371-024-03569-y","DOIUrl":null,"url":null,"abstract":"<p>Semantic segmentation plays a pivotal role in computer scene understanding, but it typically requires a large amount of computing to achieve high performance. To achieve a balance between accuracy and complexity, we propose a lightweight semantic segmentation model, termed LiteMSNet (a Lightweight Semantic Segmentation Network with Multi-Scale Feature Extraction for urban streetscape scenes). In this model, we propose a novel Improved Feature Pyramid Network, which embeds a shuffle attention mechanism followed by a stacked Depth-wise Asymmetric Gating Module. Furthermore, a Multi-scale Dilation Pyramid Module is developed to expand the receptive field and capture multi-scale feature information. Finally, the proposed lightweight model integrates two loss mechanisms, the Cross-Entropy and the Dice Loss functions, which effectively mitigate the issue of data imbalance and gradient saturation. Numerical experimental results on the CamVid dataset demonstrate a remarkable mIoU measurement of 70.85% with less than 5M parameters, accompanied by a real-time inference speed of 66.1 FPS, surpassing the existing methods documented in the literature. The code for this work will be made available at https://github.com/River-ding/LiteMSNet.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"39 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LiteMSNet: a lightweight semantic segmentation network with multi-scale feature extraction for urban streetscape scenes\",\"authors\":\"Lirong Li, Jiang Ding, Hao Cui, Zhiqiang Chen, Guisheng Liao\",\"doi\":\"10.1007/s00371-024-03569-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Semantic segmentation plays a pivotal role in computer scene understanding, but it typically requires a large amount of computing to achieve high performance. To achieve a balance between accuracy and complexity, we propose a lightweight semantic segmentation model, termed LiteMSNet (a Lightweight Semantic Segmentation Network with Multi-Scale Feature Extraction for urban streetscape scenes). In this model, we propose a novel Improved Feature Pyramid Network, which embeds a shuffle attention mechanism followed by a stacked Depth-wise Asymmetric Gating Module. Furthermore, a Multi-scale Dilation Pyramid Module is developed to expand the receptive field and capture multi-scale feature information. Finally, the proposed lightweight model integrates two loss mechanisms, the Cross-Entropy and the Dice Loss functions, which effectively mitigate the issue of data imbalance and gradient saturation. Numerical experimental results on the CamVid dataset demonstrate a remarkable mIoU measurement of 70.85% with less than 5M parameters, accompanied by a real-time inference speed of 66.1 FPS, surpassing the existing methods documented in the literature. The code for this work will be made available at https://github.com/River-ding/LiteMSNet.</p>\",\"PeriodicalId\":501186,\"journal\":{\"name\":\"The Visual Computer\",\"volume\":\"39 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Visual Computer\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00371-024-03569-y\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03569-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

语义分割在计算机场景理解中起着举足轻重的作用，但通常需要大量计算才能实现高性能。为了在准确性和复杂性之间取得平衡，我们提出了一种轻量级语义分割模型，称为 LiteMSNet（针对城市街景场景的多尺度特征提取轻量级语义分割网络）。在这一模型中，我们提出了一种新颖的改进型特征金字塔网络，其中嵌入了一种洗牌关注机制，然后是一个堆叠的深度非对称门控模块。此外，我们还开发了多尺度扩张金字塔模块，以扩大感受野和捕捉多尺度特征信息。最后，所提出的轻量级模型集成了两种损失机制，即交叉熵和骰子损失函数，从而有效地缓解了数据不平衡和梯度饱和的问题。在 CamVid 数据集上的数值实验结果表明，在参数小于 500 万的情况下，mIoU 测量值达到了 70.85%，同时实时推理速度达到了 66.1 FPS，超过了文献中记载的现有方法。这项工作的代码将公布在 https://github.com/River-ding/LiteMSNet 网站上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

LiteMSNet: a lightweight semantic segmentation network with multi-scale feature extraction for urban streetscape scenes

查看原文本刊更多论文

LiteMSNet: a lightweight semantic segmentation network with multi-scale feature extraction for urban streetscape scenes

Semantic segmentation plays a pivotal role in computer scene understanding, but it typically requires a large amount of computing to achieve high performance. To achieve a balance between accuracy and complexity, we propose a lightweight semantic segmentation model, termed LiteMSNet (a Lightweight Semantic Segmentation Network with Multi-Scale Feature Extraction for urban streetscape scenes). In this model, we propose a novel Improved Feature Pyramid Network, which embeds a shuffle attention mechanism followed by a stacked Depth-wise Asymmetric Gating Module. Furthermore, a Multi-scale Dilation Pyramid Module is developed to expand the receptive field and capture multi-scale feature information. Finally, the proposed lightweight model integrates two loss mechanisms, the Cross-Entropy and the Dice Loss functions, which effectively mitigate the issue of data imbalance and gradient saturation. Numerical experimental results on the CamVid dataset demonstrate a remarkable mIoU measurement of 70.85% with less than 5M parameters, accompanied by a real-time inference speed of 66.1 FPS, surpassing the existing methods documented in the literature. The code for this work will be made available at https://github.com/River-ding/LiteMSNet.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The Visual Computer

自引率

0.00%

发文量