{"title":"A weight-sharing based RGB-T image semantic segmentation network with hierarchical feature enhancement and progressive feature fusion","authors":"Xue Weimin , Liu Yisha , Zhuang Yan","doi":"10.1016/j.neucom.2025.131023","DOIUrl":null,"url":null,"abstract":"<div><div>RGB-T image segmentation algorithms have been widely adopted in various fields, such as surveillance and autonomous driving. These algorithms typically employ separate encoders to ensure each branch extracts modality-specific features. Nevertheless, this design increases parameters and potential conflicts between multimodal features. An alternative solution is using a weight-sharing encoder, which facilitates consistent encoding across different data types, reducing the number of training parameters and enhancing the encoder’s generalization capability. However, a weight-sharing encoder tends to extract modality-shared features but under-represent modality-specific features, thereby limiting segmentation performance under heterogeneous sensor conditions. To preserve the modality-shared features of multimodal data while simultaneously enhancing modality-specific features, we propose a novel Weight-Sharing based RGB-T image semantic segmentation network (WSRT) with a Hierarchical Feature Enhancement Module (HFEM) and Progressive Fusion Decoder (PFD). HFEM first integrates the modality-shared information from the weight-sharing encoded feature maps to generate enhanced modality-shared feature maps. Subsequently, it utilizes these enhanced feature maps to generate relative modality-specific feature maps of the RGB and thermal modalities. PFD is proposed to progressively integrate multi-scale features from different stages to decode these enhanced features more effectively. Experimental results on multiple RGB-T image semantic segmentation datasets demonstrate that our method achieves top-ranking performance or competitive results. The code is available at: <span><span>https://github.com/bearxwm/WSRT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"652 ","pages":"Article 131023"},"PeriodicalIF":5.5000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225016959","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
RGB-T image segmentation algorithms have been widely adopted in various fields, such as surveillance and autonomous driving. These algorithms typically employ separate encoders to ensure each branch extracts modality-specific features. Nevertheless, this design increases parameters and potential conflicts between multimodal features. An alternative solution is using a weight-sharing encoder, which facilitates consistent encoding across different data types, reducing the number of training parameters and enhancing the encoder’s generalization capability. However, a weight-sharing encoder tends to extract modality-shared features but under-represent modality-specific features, thereby limiting segmentation performance under heterogeneous sensor conditions. To preserve the modality-shared features of multimodal data while simultaneously enhancing modality-specific features, we propose a novel Weight-Sharing based RGB-T image semantic segmentation network (WSRT) with a Hierarchical Feature Enhancement Module (HFEM) and Progressive Fusion Decoder (PFD). HFEM first integrates the modality-shared information from the weight-sharing encoded feature maps to generate enhanced modality-shared feature maps. Subsequently, it utilizes these enhanced feature maps to generate relative modality-specific feature maps of the RGB and thermal modalities. PFD is proposed to progressively integrate multi-scale features from different stages to decode these enhanced features more effectively. Experimental results on multiple RGB-T image semantic segmentation datasets demonstrate that our method achieves top-ranking performance or competitive results. The code is available at: https://github.com/bearxwm/WSRT.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.