{"title":"基于分级特征增强和渐进式特征融合的RGB-T图像语义分割网络","authors":"Xue Weimin , Liu Yisha , Zhuang Yan","doi":"10.1016/j.neucom.2025.131023","DOIUrl":null,"url":null,"abstract":"<div><div>RGB-T image segmentation algorithms have been widely adopted in various fields, such as surveillance and autonomous driving. These algorithms typically employ separate encoders to ensure each branch extracts modality-specific features. Nevertheless, this design increases parameters and potential conflicts between multimodal features. An alternative solution is using a weight-sharing encoder, which facilitates consistent encoding across different data types, reducing the number of training parameters and enhancing the encoder’s generalization capability. However, a weight-sharing encoder tends to extract modality-shared features but under-represent modality-specific features, thereby limiting segmentation performance under heterogeneous sensor conditions. To preserve the modality-shared features of multimodal data while simultaneously enhancing modality-specific features, we propose a novel Weight-Sharing based RGB-T image semantic segmentation network (WSRT) with a Hierarchical Feature Enhancement Module (HFEM) and Progressive Fusion Decoder (PFD). HFEM first integrates the modality-shared information from the weight-sharing encoded feature maps to generate enhanced modality-shared feature maps. Subsequently, it utilizes these enhanced feature maps to generate relative modality-specific feature maps of the RGB and thermal modalities. PFD is proposed to progressively integrate multi-scale features from different stages to decode these enhanced features more effectively. Experimental results on multiple RGB-T image semantic segmentation datasets demonstrate that our method achieves top-ranking performance or competitive results. The code is available at: <span><span>https://github.com/bearxwm/WSRT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"652 ","pages":"Article 131023"},"PeriodicalIF":5.5000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A weight-sharing based RGB-T image semantic segmentation network with hierarchical feature enhancement and progressive feature fusion\",\"authors\":\"Xue Weimin , Liu Yisha , Zhuang Yan\",\"doi\":\"10.1016/j.neucom.2025.131023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>RGB-T image segmentation algorithms have been widely adopted in various fields, such as surveillance and autonomous driving. These algorithms typically employ separate encoders to ensure each branch extracts modality-specific features. Nevertheless, this design increases parameters and potential conflicts between multimodal features. An alternative solution is using a weight-sharing encoder, which facilitates consistent encoding across different data types, reducing the number of training parameters and enhancing the encoder’s generalization capability. However, a weight-sharing encoder tends to extract modality-shared features but under-represent modality-specific features, thereby limiting segmentation performance under heterogeneous sensor conditions. To preserve the modality-shared features of multimodal data while simultaneously enhancing modality-specific features, we propose a novel Weight-Sharing based RGB-T image semantic segmentation network (WSRT) with a Hierarchical Feature Enhancement Module (HFEM) and Progressive Fusion Decoder (PFD). HFEM first integrates the modality-shared information from the weight-sharing encoded feature maps to generate enhanced modality-shared feature maps. Subsequently, it utilizes these enhanced feature maps to generate relative modality-specific feature maps of the RGB and thermal modalities. PFD is proposed to progressively integrate multi-scale features from different stages to decode these enhanced features more effectively. Experimental results on multiple RGB-T image semantic segmentation datasets demonstrate that our method achieves top-ranking performance or competitive results. The code is available at: <span><span>https://github.com/bearxwm/WSRT</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"652 \",\"pages\":\"Article 131023\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225016959\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225016959","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A weight-sharing based RGB-T image semantic segmentation network with hierarchical feature enhancement and progressive feature fusion
RGB-T image segmentation algorithms have been widely adopted in various fields, such as surveillance and autonomous driving. These algorithms typically employ separate encoders to ensure each branch extracts modality-specific features. Nevertheless, this design increases parameters and potential conflicts between multimodal features. An alternative solution is using a weight-sharing encoder, which facilitates consistent encoding across different data types, reducing the number of training parameters and enhancing the encoder’s generalization capability. However, a weight-sharing encoder tends to extract modality-shared features but under-represent modality-specific features, thereby limiting segmentation performance under heterogeneous sensor conditions. To preserve the modality-shared features of multimodal data while simultaneously enhancing modality-specific features, we propose a novel Weight-Sharing based RGB-T image semantic segmentation network (WSRT) with a Hierarchical Feature Enhancement Module (HFEM) and Progressive Fusion Decoder (PFD). HFEM first integrates the modality-shared information from the weight-sharing encoded feature maps to generate enhanced modality-shared feature maps. Subsequently, it utilizes these enhanced feature maps to generate relative modality-specific feature maps of the RGB and thermal modalities. PFD is proposed to progressively integrate multi-scale features from different stages to decode these enhanced features more effectively. Experimental results on multiple RGB-T image semantic segmentation datasets demonstrate that our method achieves top-ranking performance or competitive results. The code is available at: https://github.com/bearxwm/WSRT.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.