{"title":"非对齐RGB-T语义分割的形变弹性多粒度学习。","authors":"Heng Zhou,Zhenxi Zhang,Chengyang Li,Chunna Tian,Yongqiang Xie,Zhongbo Li,Xiao-Jun Wu","doi":"10.1109/tnnls.2025.3585105","DOIUrl":null,"url":null,"abstract":"RGB-Thermal semantic segmentation (SS) aims to combine visual light and thermal images to determine the semantic category for each pixel and create an object mask. While existing methods typically rely on well-aligned RGB-T image pairs, real-world RGB-T pairs are often unaligned, and pixel-by-pixel alignment is both challenging and time-consuming. To address this critical issue, we introduce a new unaligned RGB-T SS benchmark and propose the deformation-resilient multigranularity learning (DML) method. DML explores the spatial consistency and modal complementarity of RGB-T and mitigates the interference of warped modalities by aligning multimodal features in a coarse-to-fine multigranularity strategy. Specifically, DML constructs a deformation-aware complementary feature enhancer (DCFE), which consists of deformation-aware feature alignment (DFA) and complementary feature aggregation (CFA) modules. DFA enhances the spatial alignment of RGB-T by estimating the deformation field of warped features. Then, CFA aggregates complementary contexts of modal differences across multiple scales to produce deformation-resilient and robust RGB-T feature representations. Finally, we design the multigranularity mask refinement engine (MMFE), which combines class-agnostic saliency prediction (CSP) and class-aware edge generation (CEG) auxiliary tasks to provide useful boundary and positional cues for SS decoders. The MMFE enhances semantic alignment and interclass separability, yielding object masks with sharp boundaries. Quantitative and qualitative experiments on aligned and unaligned datasets validate the effectiveness of our proposed DML, consistently outperforming existing methods designed for aligned RGB-T data. The new unaligned RGB-T SS benchmark and code are available at https://github.com/VisionVerse/Unaligned-RGBT-Semantic-Segmentation.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"13 1","pages":""},"PeriodicalIF":8.9000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deformation-Resilient Multigranularity Learning for Unaligned RGB-T Semantic Segmentation.\",\"authors\":\"Heng Zhou,Zhenxi Zhang,Chengyang Li,Chunna Tian,Yongqiang Xie,Zhongbo Li,Xiao-Jun Wu\",\"doi\":\"10.1109/tnnls.2025.3585105\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"RGB-Thermal semantic segmentation (SS) aims to combine visual light and thermal images to determine the semantic category for each pixel and create an object mask. While existing methods typically rely on well-aligned RGB-T image pairs, real-world RGB-T pairs are often unaligned, and pixel-by-pixel alignment is both challenging and time-consuming. To address this critical issue, we introduce a new unaligned RGB-T SS benchmark and propose the deformation-resilient multigranularity learning (DML) method. DML explores the spatial consistency and modal complementarity of RGB-T and mitigates the interference of warped modalities by aligning multimodal features in a coarse-to-fine multigranularity strategy. Specifically, DML constructs a deformation-aware complementary feature enhancer (DCFE), which consists of deformation-aware feature alignment (DFA) and complementary feature aggregation (CFA) modules. DFA enhances the spatial alignment of RGB-T by estimating the deformation field of warped features. Then, CFA aggregates complementary contexts of modal differences across multiple scales to produce deformation-resilient and robust RGB-T feature representations. Finally, we design the multigranularity mask refinement engine (MMFE), which combines class-agnostic saliency prediction (CSP) and class-aware edge generation (CEG) auxiliary tasks to provide useful boundary and positional cues for SS decoders. The MMFE enhances semantic alignment and interclass separability, yielding object masks with sharp boundaries. Quantitative and qualitative experiments on aligned and unaligned datasets validate the effectiveness of our proposed DML, consistently outperforming existing methods designed for aligned RGB-T data. The new unaligned RGB-T SS benchmark and code are available at https://github.com/VisionVerse/Unaligned-RGBT-Semantic-Segmentation.\",\"PeriodicalId\":13303,\"journal\":{\"name\":\"IEEE transactions on neural networks and learning systems\",\"volume\":\"13 1\",\"pages\":\"\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on neural networks and learning systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/tnnls.2025.3585105\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tnnls.2025.3585105","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Deformation-Resilient Multigranularity Learning for Unaligned RGB-T Semantic Segmentation.
RGB-Thermal semantic segmentation (SS) aims to combine visual light and thermal images to determine the semantic category for each pixel and create an object mask. While existing methods typically rely on well-aligned RGB-T image pairs, real-world RGB-T pairs are often unaligned, and pixel-by-pixel alignment is both challenging and time-consuming. To address this critical issue, we introduce a new unaligned RGB-T SS benchmark and propose the deformation-resilient multigranularity learning (DML) method. DML explores the spatial consistency and modal complementarity of RGB-T and mitigates the interference of warped modalities by aligning multimodal features in a coarse-to-fine multigranularity strategy. Specifically, DML constructs a deformation-aware complementary feature enhancer (DCFE), which consists of deformation-aware feature alignment (DFA) and complementary feature aggregation (CFA) modules. DFA enhances the spatial alignment of RGB-T by estimating the deformation field of warped features. Then, CFA aggregates complementary contexts of modal differences across multiple scales to produce deformation-resilient and robust RGB-T feature representations. Finally, we design the multigranularity mask refinement engine (MMFE), which combines class-agnostic saliency prediction (CSP) and class-aware edge generation (CEG) auxiliary tasks to provide useful boundary and positional cues for SS decoders. The MMFE enhances semantic alignment and interclass separability, yielding object masks with sharp boundaries. Quantitative and qualitative experiments on aligned and unaligned datasets validate the effectiveness of our proposed DML, consistently outperforming existing methods designed for aligned RGB-T data. The new unaligned RGB-T SS benchmark and code are available at https://github.com/VisionVerse/Unaligned-RGBT-Semantic-Segmentation.
期刊介绍:
The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.