基于分级特征增强和渐进式特征融合的RGB-T图像语义分割网络

IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Xue Weimin , Liu Yisha , Zhuang Yan
{"title":"基于分级特征增强和渐进式特征融合的RGB-T图像语义分割网络","authors":"Xue Weimin ,&nbsp;Liu Yisha ,&nbsp;Zhuang Yan","doi":"10.1016/j.neucom.2025.131023","DOIUrl":null,"url":null,"abstract":"<div><div>RGB-T image segmentation algorithms have been widely adopted in various fields, such as surveillance and autonomous driving. These algorithms typically employ separate encoders to ensure each branch extracts modality-specific features. Nevertheless, this design increases parameters and potential conflicts between multimodal features. An alternative solution is using a weight-sharing encoder, which facilitates consistent encoding across different data types, reducing the number of training parameters and enhancing the encoder’s generalization capability. However, a weight-sharing encoder tends to extract modality-shared features but under-represent modality-specific features, thereby limiting segmentation performance under heterogeneous sensor conditions. To preserve the modality-shared features of multimodal data while simultaneously enhancing modality-specific features, we propose a novel Weight-Sharing based RGB-T image semantic segmentation network (WSRT) with a Hierarchical Feature Enhancement Module (HFEM) and Progressive Fusion Decoder (PFD). HFEM first integrates the modality-shared information from the weight-sharing encoded feature maps to generate enhanced modality-shared feature maps. Subsequently, it utilizes these enhanced feature maps to generate relative modality-specific feature maps of the RGB and thermal modalities. PFD is proposed to progressively integrate multi-scale features from different stages to decode these enhanced features more effectively. Experimental results on multiple RGB-T image semantic segmentation datasets demonstrate that our method achieves top-ranking performance or competitive results. The code is available at: <span><span>https://github.com/bearxwm/WSRT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"652 ","pages":"Article 131023"},"PeriodicalIF":5.5000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A weight-sharing based RGB-T image semantic segmentation network with hierarchical feature enhancement and progressive feature fusion\",\"authors\":\"Xue Weimin ,&nbsp;Liu Yisha ,&nbsp;Zhuang Yan\",\"doi\":\"10.1016/j.neucom.2025.131023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>RGB-T image segmentation algorithms have been widely adopted in various fields, such as surveillance and autonomous driving. These algorithms typically employ separate encoders to ensure each branch extracts modality-specific features. Nevertheless, this design increases parameters and potential conflicts between multimodal features. An alternative solution is using a weight-sharing encoder, which facilitates consistent encoding across different data types, reducing the number of training parameters and enhancing the encoder’s generalization capability. However, a weight-sharing encoder tends to extract modality-shared features but under-represent modality-specific features, thereby limiting segmentation performance under heterogeneous sensor conditions. To preserve the modality-shared features of multimodal data while simultaneously enhancing modality-specific features, we propose a novel Weight-Sharing based RGB-T image semantic segmentation network (WSRT) with a Hierarchical Feature Enhancement Module (HFEM) and Progressive Fusion Decoder (PFD). HFEM first integrates the modality-shared information from the weight-sharing encoded feature maps to generate enhanced modality-shared feature maps. Subsequently, it utilizes these enhanced feature maps to generate relative modality-specific feature maps of the RGB and thermal modalities. PFD is proposed to progressively integrate multi-scale features from different stages to decode these enhanced features more effectively. Experimental results on multiple RGB-T image semantic segmentation datasets demonstrate that our method achieves top-ranking performance or competitive results. The code is available at: <span><span>https://github.com/bearxwm/WSRT</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"652 \",\"pages\":\"Article 131023\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225016959\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225016959","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

RGB-T图像分割算法已广泛应用于监控、自动驾驶等各个领域。这些算法通常使用单独的编码器来确保每个分支提取特定于模态的特征。然而,这种设计增加了参数和多模态特征之间的潜在冲突。另一种解决方案是使用权重共享编码器,它有助于跨不同数据类型的一致编码,减少训练参数的数量并增强编码器的泛化能力。然而,权重共享编码器倾向于提取模态共享特征,而对模态特定特征的表示不足,从而限制了异构传感器条件下的分割性能。为了保持多模态数据的模态共享特征,同时增强模态特定特征,我们提出了一种新的基于权重共享的RGB-T图像语义分割网络(WSRT),该网络具有层次特征增强模块(HFEM)和渐进融合解码器(PFD)。HFEM首先将权值共享编码特征图中的模态共享信息集成到增强的模态共享特征图中。随后,它利用这些增强的特征映射来生成RGB和热模态的相对模态特定的特征映射。提出了PFD逐步整合不同阶段的多尺度特征,以更有效地解码这些增强的特征。在多个RGB-T图像语义分割数据集上的实验结果表明,我们的方法取得了一流的性能或具有竞争力的结果。代码可从https://github.com/bearxwm/WSRT获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A weight-sharing based RGB-T image semantic segmentation network with hierarchical feature enhancement and progressive feature fusion
RGB-T image segmentation algorithms have been widely adopted in various fields, such as surveillance and autonomous driving. These algorithms typically employ separate encoders to ensure each branch extracts modality-specific features. Nevertheless, this design increases parameters and potential conflicts between multimodal features. An alternative solution is using a weight-sharing encoder, which facilitates consistent encoding across different data types, reducing the number of training parameters and enhancing the encoder’s generalization capability. However, a weight-sharing encoder tends to extract modality-shared features but under-represent modality-specific features, thereby limiting segmentation performance under heterogeneous sensor conditions. To preserve the modality-shared features of multimodal data while simultaneously enhancing modality-specific features, we propose a novel Weight-Sharing based RGB-T image semantic segmentation network (WSRT) with a Hierarchical Feature Enhancement Module (HFEM) and Progressive Fusion Decoder (PFD). HFEM first integrates the modality-shared information from the weight-sharing encoded feature maps to generate enhanced modality-shared feature maps. Subsequently, it utilizes these enhanced feature maps to generate relative modality-specific feature maps of the RGB and thermal modalities. PFD is proposed to progressively integrate multi-scale features from different stages to decode these enhanced features more effectively. Experimental results on multiple RGB-T image semantic segmentation datasets demonstrate that our method achieves top-ranking performance or competitive results. The code is available at: https://github.com/bearxwm/WSRT.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信