非对齐RGB-T语义分割的形变弹性多粒度学习。

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems Pub Date : 2025-07-08 DOI:10.1109/tnnls.2025.3585105

Heng Zhou,Zhenxi Zhang,Chengyang Li,Chunna Tian,Yongqiang Xie,Zhongbo Li,Xiao-Jun Wu

{"title":"非对齐RGB-T语义分割的形变弹性多粒度学习。","authors":"Heng Zhou,Zhenxi Zhang,Chengyang Li,Chunna Tian,Yongqiang Xie,Zhongbo Li,Xiao-Jun Wu","doi":"10.1109/tnnls.2025.3585105","DOIUrl":null,"url":null,"abstract":"RGB-Thermal semantic segmentation (SS) aims to combine visual light and thermal images to determine the semantic category for each pixel and create an object mask. While existing methods typically rely on well-aligned RGB-T image pairs, real-world RGB-T pairs are often unaligned, and pixel-by-pixel alignment is both challenging and time-consuming. To address this critical issue, we introduce a new unaligned RGB-T SS benchmark and propose the deformation-resilient multigranularity learning (DML) method. DML explores the spatial consistency and modal complementarity of RGB-T and mitigates the interference of warped modalities by aligning multimodal features in a coarse-to-fine multigranularity strategy. Specifically, DML constructs a deformation-aware complementary feature enhancer (DCFE), which consists of deformation-aware feature alignment (DFA) and complementary feature aggregation (CFA) modules. DFA enhances the spatial alignment of RGB-T by estimating the deformation field of warped features. Then, CFA aggregates complementary contexts of modal differences across multiple scales to produce deformation-resilient and robust RGB-T feature representations. Finally, we design the multigranularity mask refinement engine (MMFE), which combines class-agnostic saliency prediction (CSP) and class-aware edge generation (CEG) auxiliary tasks to provide useful boundary and positional cues for SS decoders. The MMFE enhances semantic alignment and interclass separability, yielding object masks with sharp boundaries. Quantitative and qualitative experiments on aligned and unaligned datasets validate the effectiveness of our proposed DML, consistently outperforming existing methods designed for aligned RGB-T data. The new unaligned RGB-T SS benchmark and code are available at https://github.com/VisionVerse/Unaligned-RGBT-Semantic-Segmentation.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"13 1","pages":""},"PeriodicalIF":8.9000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deformation-Resilient Multigranularity Learning for Unaligned RGB-T Semantic Segmentation.\",\"authors\":\"Heng Zhou,Zhenxi Zhang,Chengyang Li,Chunna Tian,Yongqiang Xie,Zhongbo Li,Xiao-Jun Wu\",\"doi\":\"10.1109/tnnls.2025.3585105\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"RGB-Thermal semantic segmentation (SS) aims to combine visual light and thermal images to determine the semantic category for each pixel and create an object mask. While existing methods typically rely on well-aligned RGB-T image pairs, real-world RGB-T pairs are often unaligned, and pixel-by-pixel alignment is both challenging and time-consuming. To address this critical issue, we introduce a new unaligned RGB-T SS benchmark and propose the deformation-resilient multigranularity learning (DML) method. DML explores the spatial consistency and modal complementarity of RGB-T and mitigates the interference of warped modalities by aligning multimodal features in a coarse-to-fine multigranularity strategy. Specifically, DML constructs a deformation-aware complementary feature enhancer (DCFE), which consists of deformation-aware feature alignment (DFA) and complementary feature aggregation (CFA) modules. DFA enhances the spatial alignment of RGB-T by estimating the deformation field of warped features. Then, CFA aggregates complementary contexts of modal differences across multiple scales to produce deformation-resilient and robust RGB-T feature representations. Finally, we design the multigranularity mask refinement engine (MMFE), which combines class-agnostic saliency prediction (CSP) and class-aware edge generation (CEG) auxiliary tasks to provide useful boundary and positional cues for SS decoders. The MMFE enhances semantic alignment and interclass separability, yielding object masks with sharp boundaries. Quantitative and qualitative experiments on aligned and unaligned datasets validate the effectiveness of our proposed DML, consistently outperforming existing methods designed for aligned RGB-T data. The new unaligned RGB-T SS benchmark and code are available at https://github.com/VisionVerse/Unaligned-RGBT-Semantic-Segmentation.\",\"PeriodicalId\":13303,\"journal\":{\"name\":\"IEEE transactions on neural networks and learning systems\",\"volume\":\"13 1\",\"pages\":\"\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on neural networks and learning systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/tnnls.2025.3585105\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tnnls.2025.3585105","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

RGB-Thermal semantic segmentation （SS）旨在结合视觉光和热图像，确定每个像素的语义类别，并创建对象掩模。虽然现有的方法通常依赖于对齐良好的RGB-T图像对，但实际的RGB-T图像对通常是不对齐的，逐像素对齐既具有挑战性又耗时。为了解决这一关键问题，我们引入了一种新的非对齐RGB-T - SS基准，并提出了变形弹性多粒度学习（DML）方法。DML探索RGB-T的空间一致性和模态互补性，并通过在粗到细的多粒度策略中对齐多模态特征来减轻扭曲模态的干扰。具体而言，DML构建了变形感知互补特征增强器（DCFE），该增强器由变形感知特征对齐（DFA）和互补特征聚合（CFA）两个模块组成。DFA通过估计扭曲特征的变形场来增强RGB-T的空间对齐。然后，CFA在多个尺度上聚合模态差异的互补上下文，以产生具有变形弹性和鲁棒性的RGB-T特征表示。最后，我们设计了多粒度掩码细化引擎（MMFE），该引擎结合了类别不可知显著性预测（CSP）和类别感知边缘生成（CEG）辅助任务，为SS解码器提供有用的边界和位置线索。MMFE增强了语义对齐和类间可分离性，产生了具有清晰边界的对象掩码。在对齐和未对齐数据集上的定量和定性实验验证了我们提出的DML的有效性，始终优于为对齐RGB-T数据设计的现有方法。新的未对齐RGB-T SS基准测试和代码可在https://github.com/VisionVerse/Unaligned-RGBT-Semantic-Segmentation上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deformation-Resilient Multigranularity Learning for Unaligned RGB-T Semantic Segmentation.

RGB-Thermal semantic segmentation (SS) aims to combine visual light and thermal images to determine the semantic category for each pixel and create an object mask. While existing methods typically rely on well-aligned RGB-T image pairs, real-world RGB-T pairs are often unaligned, and pixel-by-pixel alignment is both challenging and time-consuming. To address this critical issue, we introduce a new unaligned RGB-T SS benchmark and propose the deformation-resilient multigranularity learning (DML) method. DML explores the spatial consistency and modal complementarity of RGB-T and mitigates the interference of warped modalities by aligning multimodal features in a coarse-to-fine multigranularity strategy. Specifically, DML constructs a deformation-aware complementary feature enhancer (DCFE), which consists of deformation-aware feature alignment (DFA) and complementary feature aggregation (CFA) modules. DFA enhances the spatial alignment of RGB-T by estimating the deformation field of warped features. Then, CFA aggregates complementary contexts of modal differences across multiple scales to produce deformation-resilient and robust RGB-T feature representations. Finally, we design the multigranularity mask refinement engine (MMFE), which combines class-agnostic saliency prediction (CSP) and class-aware edge generation (CEG) auxiliary tasks to provide useful boundary and positional cues for SS decoders. The MMFE enhances semantic alignment and interclass separability, yielding object masks with sharp boundaries. Quantitative and qualitative experiments on aligned and unaligned datasets validate the effectiveness of our proposed DML, consistently outperforming existing methods designed for aligned RGB-T data. The new unaligned RGB-T SS benchmark and code are available at https://github.com/VisionVerse/Unaligned-RGBT-Semantic-Segmentation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

CiteScore

23.80

自引率

9.60%

发文量

2102

审稿时长

3-8 weeks

期刊介绍： The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.