{"title":"LiteMSNet: a lightweight semantic segmentation network with multi-scale feature extraction for urban streetscape scenes","authors":"Lirong Li, Jiang Ding, Hao Cui, Zhiqiang Chen, Guisheng Liao","doi":"10.1007/s00371-024-03569-y","DOIUrl":null,"url":null,"abstract":"<p>Semantic segmentation plays a pivotal role in computer scene understanding, but it typically requires a large amount of computing to achieve high performance. To achieve a balance between accuracy and complexity, we propose a lightweight semantic segmentation model, termed LiteMSNet (a Lightweight Semantic Segmentation Network with Multi-Scale Feature Extraction for urban streetscape scenes). In this model, we propose a novel Improved Feature Pyramid Network, which embeds a shuffle attention mechanism followed by a stacked Depth-wise Asymmetric Gating Module. Furthermore, a Multi-scale Dilation Pyramid Module is developed to expand the receptive field and capture multi-scale feature information. Finally, the proposed lightweight model integrates two loss mechanisms, the Cross-Entropy and the Dice Loss functions, which effectively mitigate the issue of data imbalance and gradient saturation. Numerical experimental results on the CamVid dataset demonstrate a remarkable mIoU measurement of 70.85% with less than 5M parameters, accompanied by a real-time inference speed of 66.1 FPS, surpassing the existing methods documented in the literature. The code for this work will be made available at https://github.com/River-ding/LiteMSNet.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"39 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03569-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Semantic segmentation plays a pivotal role in computer scene understanding, but it typically requires a large amount of computing to achieve high performance. To achieve a balance between accuracy and complexity, we propose a lightweight semantic segmentation model, termed LiteMSNet (a Lightweight Semantic Segmentation Network with Multi-Scale Feature Extraction for urban streetscape scenes). In this model, we propose a novel Improved Feature Pyramid Network, which embeds a shuffle attention mechanism followed by a stacked Depth-wise Asymmetric Gating Module. Furthermore, a Multi-scale Dilation Pyramid Module is developed to expand the receptive field and capture multi-scale feature information. Finally, the proposed lightweight model integrates two loss mechanisms, the Cross-Entropy and the Dice Loss functions, which effectively mitigate the issue of data imbalance and gradient saturation. Numerical experimental results on the CamVid dataset demonstrate a remarkable mIoU measurement of 70.85% with less than 5M parameters, accompanied by a real-time inference speed of 66.1 FPS, surpassing the existing methods documented in the literature. The code for this work will be made available at https://github.com/River-ding/LiteMSNet.