Shenlu Zhao , Jingyi Wang , Qiang Zhang , Jungong Han
{"title":"Towards efficient RGB-T semantic segmentation via feature generative distillation strategy","authors":"Shenlu Zhao , Jingyi Wang , Qiang Zhang , Jungong Han","doi":"10.1016/j.inffus.2025.103282","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, multimodal knowledge distillation-based methods for RGB-T semantic segmentation have been developed to enhance segmentation performance and inference speeds. Technically, the crux of these models lies in the feature imitative distillation-based strategies, where the student models imitate the working principles of the teacher models through loss functions. Unfortunately, due to the significant gaps in the representation capability between the student and teacher models, such feature imitative distillation-based strategies may not achieve the anticipatory knowledge transfer performance in an efficient way. In this paper, we propose a novel feature generative distillation strategy for efficient RGB-T semantic segmentation, embodied in the Feature Generative Distillation-based Network (FGDNet), which includes a teacher model (FGDNet-T) and a student model (FGDNet-S). This strategy bridges the gaps between multimodal feature extraction and complementary information excavation by using Conditional Variational Auto-Encoder (CVAE) to generate teacher features from student features. Additionally, Multimodal Complementarity Separation modules (MCS-L and MCS-H) are introduced to separate complementary features at different levels. Comprehensive experimental results on four public benchmarks demonstrate that, compared with mainstream RGB-T semantic segmentation methods, our FGDNet-S achieves competitive segmentation performance with lower number of parameters and computational complexity.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103282"},"PeriodicalIF":14.7000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525003550","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, multimodal knowledge distillation-based methods for RGB-T semantic segmentation have been developed to enhance segmentation performance and inference speeds. Technically, the crux of these models lies in the feature imitative distillation-based strategies, where the student models imitate the working principles of the teacher models through loss functions. Unfortunately, due to the significant gaps in the representation capability between the student and teacher models, such feature imitative distillation-based strategies may not achieve the anticipatory knowledge transfer performance in an efficient way. In this paper, we propose a novel feature generative distillation strategy for efficient RGB-T semantic segmentation, embodied in the Feature Generative Distillation-based Network (FGDNet), which includes a teacher model (FGDNet-T) and a student model (FGDNet-S). This strategy bridges the gaps between multimodal feature extraction and complementary information excavation by using Conditional Variational Auto-Encoder (CVAE) to generate teacher features from student features. Additionally, Multimodal Complementarity Separation modules (MCS-L and MCS-H) are introduced to separate complementary features at different levels. Comprehensive experimental results on four public benchmarks demonstrate that, compared with mainstream RGB-T semantic segmentation methods, our FGDNet-S achieves competitive segmentation performance with lower number of parameters and computational complexity.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.