Real-Time Semantic Segmentation Algorithm for Street Scenes Based on Attention Mechanism and Feature Fusion

IF 2.6 3区工程技术 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Electronics Pub Date : 2024-09-18 DOI:10.3390/electronics13183699

Bao Wu, Xingzhong Xiong, Yong Wang

{"title":"Real-Time Semantic Segmentation Algorithm for Street Scenes Based on Attention Mechanism and Feature Fusion","authors":"Bao Wu, Xingzhong Xiong, Yong Wang","doi":"10.3390/electronics13183699","DOIUrl":null,"url":null,"abstract":"In computer vision, the task of semantic segmentation is crucial for applications such as autonomous driving and intelligent surveillance. However, achieving a balance between real-time performance and segmentation accuracy remains a significant challenge. Although Fast-SCNN is favored for its efficiency and low computational complexity, it still faces difficulties when handling complex street scene images. To address this issue, this paper presents an improved Fast-SCNN, aiming to enhance the accuracy and efficiency of semantic segmentation by incorporating a novel attention mechanism and an enhanced feature extraction module. Firstly, the integrated SimAM (Simple, Parameter-Free Attention Module) increases the network’s sensitivity to critical regions of the image and effectively adjusts the feature space weights across channels. Additionally, the refined pyramid pooling module in the global feature extraction module captures a broader range of contextual information through refined pooling levels. During the feature fusion stage, the introduction of an enhanced DAB (Depthwise Asymmetric Bottleneck) block and SE (Squeeze-and-Excitation) attention optimizes the network’s ability to process multi-scale information. Furthermore, the classifier module is extended by incorporating deeper convolutions and more complex convolutional structures, leading to a further improvement in model performance. These enhancements significantly improve the model’s ability to capture details and overall segmentation performance. Experimental results demonstrate that the proposed method excels in processing complex street scene images, achieving a mean Intersection over Union (mIoU) of 71.7% and 69.4% on the Cityscapes and CamVid datasets, respectively, while maintaining inference speeds of 81.4 fps and 113.6 fps. These results indicate that the proposed model effectively improves segmentation quality in complex street scenes while ensuring real-time processing capabilities.","PeriodicalId":11646,"journal":{"name":"Electronics","volume":"3 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3390/electronics13183699","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In computer vision, the task of semantic segmentation is crucial for applications such as autonomous driving and intelligent surveillance. However, achieving a balance between real-time performance and segmentation accuracy remains a significant challenge. Although Fast-SCNN is favored for its efficiency and low computational complexity, it still faces difficulties when handling complex street scene images. To address this issue, this paper presents an improved Fast-SCNN, aiming to enhance the accuracy and efficiency of semantic segmentation by incorporating a novel attention mechanism and an enhanced feature extraction module. Firstly, the integrated SimAM (Simple, Parameter-Free Attention Module) increases the network’s sensitivity to critical regions of the image and effectively adjusts the feature space weights across channels. Additionally, the refined pyramid pooling module in the global feature extraction module captures a broader range of contextual information through refined pooling levels. During the feature fusion stage, the introduction of an enhanced DAB (Depthwise Asymmetric Bottleneck) block and SE (Squeeze-and-Excitation) attention optimizes the network’s ability to process multi-scale information. Furthermore, the classifier module is extended by incorporating deeper convolutions and more complex convolutional structures, leading to a further improvement in model performance. These enhancements significantly improve the model’s ability to capture details and overall segmentation performance. Experimental results demonstrate that the proposed method excels in processing complex street scene images, achieving a mean Intersection over Union (mIoU) of 71.7% and 69.4% on the Cityscapes and CamVid datasets, respectively, while maintaining inference speeds of 81.4 fps and 113.6 fps. These results indicate that the proposed model effectively improves segmentation quality in complex street scenes while ensuring real-time processing capabilities.

查看原文本刊更多论文

基于注意机制和特征融合的街景实时语义分割算法

在计算机视觉领域，语义分割任务对于自动驾驶和智能监控等应用至关重要。然而，如何在实时性和分割准确性之间取得平衡仍然是一项重大挑战。尽管 Fast-SCNN 因其高效率和低计算复杂度而备受青睐，但在处理复杂街景图像时仍面临困难。为解决这一问题，本文提出了一种改进的 Fast-SCNN，旨在通过集成新颖的注意机制和增强的特征提取模块来提高语义分割的准确性和效率。首先，集成的 SimAM（简单无参数注意力模块）提高了网络对图像关键区域的灵敏度，并有效调整了跨通道的特征空间权重。此外，全局特征提取模块中的精炼金字塔池化模块通过精炼池化水平捕获了更广泛的上下文信息。在特征融合阶段，增强型 DAB（深度非对称瓶颈）区块和 SE（挤压-激发）注意力的引入优化了网络处理多尺度信息的能力。此外，分类器模块也得到了扩展，加入了更深的卷积和更复杂的卷积结构，从而进一步提高了模型性能。这些改进大大提高了模型捕捉细节的能力和整体分割性能。实验结果表明，所提出的方法在处理复杂街景图像方面表现出色，在 Cityscapes 和 CamVid 数据集上的平均交叉比联合（mIoU）分别达到 71.7% 和 69.4%，同时推理速度保持在 81.4 fps 和 113.6 fps。这些结果表明，所提出的模型能有效提高复杂街道场景的分割质量，同时确保实时处理能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Electronics Computer Science-Computer Networks and Communications

CiteScore

1.10

自引率

10.30%

发文量

3515

审稿时长

16.71 days

期刊介绍： Electronics (ISSN 2079-9292; CODEN: ELECGJ) is an international, open access journal on the science of electronics and its applications published quarterly online by MDPI.