Guanke Chen , Haibin Li , Yaqian Li , Wenming Zhang , Tao Song
{"title":"Parallel segmentation network for real-time semantic segmentation","authors":"Guanke Chen , Haibin Li , Yaqian Li , Wenming Zhang , Tao Song","doi":"10.1016/j.engappai.2025.110487","DOIUrl":null,"url":null,"abstract":"<div><div>Real-time semantic segmentation holds extensive application prospects in autonomous driving and robot navigation. Recently, real-time semantic segmentation networks mainly adopt encoder-decoder architecture and multi-branch architecture. However, both approaches have their own advantages and limitations. Encoder-decoder models are generally better at extracting contextual information, but may face challenges in capturing fine details and local spatial information. On the other hand, the multi-branch structure excels at capturing boundary and spatial detail information, but it requires an efficient and flexible feature fusion strategy to prevent information redundancy. To leverage the strengths of both approaches, we propose a Parallel Segmentation Network (PaSeNet) which adopts the unsymmetrical encoder-decoder structure to introduce novel ideas for research and applications in real-time semantic segmentation. Specifically, we design a main branch with a spatial information enhancement path during the encoding phase and introduce mask autoencoder based on self-supervised learning as an auxiliary branch to supplement the main branch in extracting details as well as local spatial information. Additionally, we propose the Grouped Aggregation Pyramid Pooling Module to optimize the extraction of contextual information. In the decoding phase, we introduce the Coordinate-Attention-Guided Decoder to effectively integrate diverse information from different branches. A large number of experiments on the Cityscapes, Cambridge-driving Labeled Video database (CamVid), NightCity and instance Segmentation in Aerial Images Dataset demonstrate that our method achieves competitive results. Specifically, PaSeNet-Base obtains 79.9% mean Intersection Over Union (mIOU) at 55.6 Frames Per Second (FPS) on Cityscapes test dataset and 80.2% mIOU at 96.8 FPS on CamVid test dataset.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"148 ","pages":"Article 110487"},"PeriodicalIF":7.5000,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625004877","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Real-time semantic segmentation holds extensive application prospects in autonomous driving and robot navigation. Recently, real-time semantic segmentation networks mainly adopt encoder-decoder architecture and multi-branch architecture. However, both approaches have their own advantages and limitations. Encoder-decoder models are generally better at extracting contextual information, but may face challenges in capturing fine details and local spatial information. On the other hand, the multi-branch structure excels at capturing boundary and spatial detail information, but it requires an efficient and flexible feature fusion strategy to prevent information redundancy. To leverage the strengths of both approaches, we propose a Parallel Segmentation Network (PaSeNet) which adopts the unsymmetrical encoder-decoder structure to introduce novel ideas for research and applications in real-time semantic segmentation. Specifically, we design a main branch with a spatial information enhancement path during the encoding phase and introduce mask autoencoder based on self-supervised learning as an auxiliary branch to supplement the main branch in extracting details as well as local spatial information. Additionally, we propose the Grouped Aggregation Pyramid Pooling Module to optimize the extraction of contextual information. In the decoding phase, we introduce the Coordinate-Attention-Guided Decoder to effectively integrate diverse information from different branches. A large number of experiments on the Cityscapes, Cambridge-driving Labeled Video database (CamVid), NightCity and instance Segmentation in Aerial Images Dataset demonstrate that our method achieves competitive results. Specifically, PaSeNet-Base obtains 79.9% mean Intersection Over Union (mIOU) at 55.6 Frames Per Second (FPS) on Cityscapes test dataset and 80.2% mIOU at 96.8 FPS on CamVid test dataset.
期刊介绍:
Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.