LiteMSNet:针对城市街景场景的多尺度特征提取轻量级语义分割网络

Lirong Li, Jiang Ding, Hao Cui, Zhiqiang Chen, Guisheng Liao
{"title":"LiteMSNet:针对城市街景场景的多尺度特征提取轻量级语义分割网络","authors":"Lirong Li, Jiang Ding, Hao Cui, Zhiqiang Chen, Guisheng Liao","doi":"10.1007/s00371-024-03569-y","DOIUrl":null,"url":null,"abstract":"<p>Semantic segmentation plays a pivotal role in computer scene understanding, but it typically requires a large amount of computing to achieve high performance. To achieve a balance between accuracy and complexity, we propose a lightweight semantic segmentation model, termed LiteMSNet (a Lightweight Semantic Segmentation Network with Multi-Scale Feature Extraction for urban streetscape scenes). In this model, we propose a novel Improved Feature Pyramid Network, which embeds a shuffle attention mechanism followed by a stacked Depth-wise Asymmetric Gating Module. Furthermore, a Multi-scale Dilation Pyramid Module is developed to expand the receptive field and capture multi-scale feature information. Finally, the proposed lightweight model integrates two loss mechanisms, the Cross-Entropy and the Dice Loss functions, which effectively mitigate the issue of data imbalance and gradient saturation. Numerical experimental results on the CamVid dataset demonstrate a remarkable mIoU measurement of 70.85% with less than 5M parameters, accompanied by a real-time inference speed of 66.1 FPS, surpassing the existing methods documented in the literature. The code for this work will be made available at https://github.com/River-ding/LiteMSNet.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"39 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LiteMSNet: a lightweight semantic segmentation network with multi-scale feature extraction for urban streetscape scenes\",\"authors\":\"Lirong Li, Jiang Ding, Hao Cui, Zhiqiang Chen, Guisheng Liao\",\"doi\":\"10.1007/s00371-024-03569-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Semantic segmentation plays a pivotal role in computer scene understanding, but it typically requires a large amount of computing to achieve high performance. To achieve a balance between accuracy and complexity, we propose a lightweight semantic segmentation model, termed LiteMSNet (a Lightweight Semantic Segmentation Network with Multi-Scale Feature Extraction for urban streetscape scenes). In this model, we propose a novel Improved Feature Pyramid Network, which embeds a shuffle attention mechanism followed by a stacked Depth-wise Asymmetric Gating Module. Furthermore, a Multi-scale Dilation Pyramid Module is developed to expand the receptive field and capture multi-scale feature information. Finally, the proposed lightweight model integrates two loss mechanisms, the Cross-Entropy and the Dice Loss functions, which effectively mitigate the issue of data imbalance and gradient saturation. Numerical experimental results on the CamVid dataset demonstrate a remarkable mIoU measurement of 70.85% with less than 5M parameters, accompanied by a real-time inference speed of 66.1 FPS, surpassing the existing methods documented in the literature. The code for this work will be made available at https://github.com/River-ding/LiteMSNet.</p>\",\"PeriodicalId\":501186,\"journal\":{\"name\":\"The Visual Computer\",\"volume\":\"39 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Visual Computer\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00371-024-03569-y\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03569-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

语义分割在计算机场景理解中起着举足轻重的作用,但通常需要大量计算才能实现高性能。为了在准确性和复杂性之间取得平衡,我们提出了一种轻量级语义分割模型,称为 LiteMSNet(针对城市街景场景的多尺度特征提取轻量级语义分割网络)。在这一模型中,我们提出了一种新颖的改进型特征金字塔网络,其中嵌入了一种洗牌关注机制,然后是一个堆叠的深度非对称门控模块。此外,我们还开发了多尺度扩张金字塔模块,以扩大感受野和捕捉多尺度特征信息。最后,所提出的轻量级模型集成了两种损失机制,即交叉熵和骰子损失函数,从而有效地缓解了数据不平衡和梯度饱和的问题。在 CamVid 数据集上的数值实验结果表明,在参数小于 500 万的情况下,mIoU 测量值达到了 70.85%,同时实时推理速度达到了 66.1 FPS,超过了文献中记载的现有方法。这项工作的代码将公布在 https://github.com/River-ding/LiteMSNet 网站上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

LiteMSNet: a lightweight semantic segmentation network with multi-scale feature extraction for urban streetscape scenes

LiteMSNet: a lightweight semantic segmentation network with multi-scale feature extraction for urban streetscape scenes

Semantic segmentation plays a pivotal role in computer scene understanding, but it typically requires a large amount of computing to achieve high performance. To achieve a balance between accuracy and complexity, we propose a lightweight semantic segmentation model, termed LiteMSNet (a Lightweight Semantic Segmentation Network with Multi-Scale Feature Extraction for urban streetscape scenes). In this model, we propose a novel Improved Feature Pyramid Network, which embeds a shuffle attention mechanism followed by a stacked Depth-wise Asymmetric Gating Module. Furthermore, a Multi-scale Dilation Pyramid Module is developed to expand the receptive field and capture multi-scale feature information. Finally, the proposed lightweight model integrates two loss mechanisms, the Cross-Entropy and the Dice Loss functions, which effectively mitigate the issue of data imbalance and gradient saturation. Numerical experimental results on the CamVid dataset demonstrate a remarkable mIoU measurement of 70.85% with less than 5M parameters, accompanied by a real-time inference speed of 66.1 FPS, surpassing the existing methods documented in the literature. The code for this work will be made available at https://github.com/River-ding/LiteMSNet.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信