A weight-sharing based RGB-T image semantic segmentation network with hierarchical feature enhancement and progressive feature fusion

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-07-21 DOI:10.1016/j.neucom.2025.131023

Xue Weimin , Liu Yisha , Zhuang Yan

{"title":"A weight-sharing based RGB-T image semantic segmentation network with hierarchical feature enhancement and progressive feature fusion","authors":"Xue Weimin , Liu Yisha , Zhuang Yan","doi":"10.1016/j.neucom.2025.131023","DOIUrl":null,"url":null,"abstract":"<div><div>RGB-T image segmentation algorithms have been widely adopted in various fields, such as surveillance and autonomous driving. These algorithms typically employ separate encoders to ensure each branch extracts modality-specific features. Nevertheless, this design increases parameters and potential conflicts between multimodal features. An alternative solution is using a weight-sharing encoder, which facilitates consistent encoding across different data types, reducing the number of training parameters and enhancing the encoder’s generalization capability. However, a weight-sharing encoder tends to extract modality-shared features but under-represent modality-specific features, thereby limiting segmentation performance under heterogeneous sensor conditions. To preserve the modality-shared features of multimodal data while simultaneously enhancing modality-specific features, we propose a novel Weight-Sharing based RGB-T image semantic segmentation network (WSRT) with a Hierarchical Feature Enhancement Module (HFEM) and Progressive Fusion Decoder (PFD). HFEM first integrates the modality-shared information from the weight-sharing encoded feature maps to generate enhanced modality-shared feature maps. Subsequently, it utilizes these enhanced feature maps to generate relative modality-specific feature maps of the RGB and thermal modalities. PFD is proposed to progressively integrate multi-scale features from different stages to decode these enhanced features more effectively. Experimental results on multiple RGB-T image semantic segmentation datasets demonstrate that our method achieves top-ranking performance or competitive results. The code is available at: <span><span>https://github.com/bearxwm/WSRT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"652 ","pages":"Article 131023"},"PeriodicalIF":5.5000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225016959","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

RGB-T image segmentation algorithms have been widely adopted in various fields, such as surveillance and autonomous driving. These algorithms typically employ separate encoders to ensure each branch extracts modality-specific features. Nevertheless, this design increases parameters and potential conflicts between multimodal features. An alternative solution is using a weight-sharing encoder, which facilitates consistent encoding across different data types, reducing the number of training parameters and enhancing the encoder’s generalization capability. However, a weight-sharing encoder tends to extract modality-shared features but under-represent modality-specific features, thereby limiting segmentation performance under heterogeneous sensor conditions. To preserve the modality-shared features of multimodal data while simultaneously enhancing modality-specific features, we propose a novel Weight-Sharing based RGB-T image semantic segmentation network (WSRT) with a Hierarchical Feature Enhancement Module (HFEM) and Progressive Fusion Decoder (PFD). HFEM first integrates the modality-shared information from the weight-sharing encoded feature maps to generate enhanced modality-shared feature maps. Subsequently, it utilizes these enhanced feature maps to generate relative modality-specific feature maps of the RGB and thermal modalities. PFD is proposed to progressively integrate multi-scale features from different stages to decode these enhanced features more effectively. Experimental results on multiple RGB-T image semantic segmentation datasets demonstrate that our method achieves top-ranking performance or competitive results. The code is available at: https://github.com/bearxwm/WSRT.

查看原文本刊更多论文

基于分级特征增强和渐进式特征融合的RGB-T图像语义分割网络

RGB-T图像分割算法已广泛应用于监控、自动驾驶等各个领域。这些算法通常使用单独的编码器来确保每个分支提取特定于模态的特征。然而，这种设计增加了参数和多模态特征之间的潜在冲突。另一种解决方案是使用权重共享编码器，它有助于跨不同数据类型的一致编码，减少训练参数的数量并增强编码器的泛化能力。然而，权重共享编码器倾向于提取模态共享特征，而对模态特定特征的表示不足，从而限制了异构传感器条件下的分割性能。为了保持多模态数据的模态共享特征，同时增强模态特定特征，我们提出了一种新的基于权重共享的RGB-T图像语义分割网络（WSRT），该网络具有层次特征增强模块（HFEM）和渐进融合解码器（PFD）。HFEM首先将权值共享编码特征图中的模态共享信息集成到增强的模态共享特征图中。随后，它利用这些增强的特征映射来生成RGB和热模态的相对模态特定的特征映射。提出了PFD逐步整合不同阶段的多尺度特征，以更有效地解码这些增强的特征。在多个RGB-T图像语义分割数据集上的实验结果表明，我们的方法取得了一流的性能或具有竞争力的结果。代码可从https://github.com/bearxwm/WSRT获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.