Towards efficient RGB-T semantic segmentation via feature generative distillation strategy

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2025-05-13 DOI:10.1016/j.inffus.2025.103282

Shenlu Zhao , Jingyi Wang , Qiang Zhang , Jungong Han

{"title":"Towards efficient RGB-T semantic segmentation via feature generative distillation strategy","authors":"Shenlu Zhao , Jingyi Wang , Qiang Zhang , Jungong Han","doi":"10.1016/j.inffus.2025.103282","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, multimodal knowledge distillation-based methods for RGB-T semantic segmentation have been developed to enhance segmentation performance and inference speeds. Technically, the crux of these models lies in the feature imitative distillation-based strategies, where the student models imitate the working principles of the teacher models through loss functions. Unfortunately, due to the significant gaps in the representation capability between the student and teacher models, such feature imitative distillation-based strategies may not achieve the anticipatory knowledge transfer performance in an efficient way. In this paper, we propose a novel feature generative distillation strategy for efficient RGB-T semantic segmentation, embodied in the Feature Generative Distillation-based Network (FGDNet), which includes a teacher model (FGDNet-T) and a student model (FGDNet-S). This strategy bridges the gaps between multimodal feature extraction and complementary information excavation by using Conditional Variational Auto-Encoder (CVAE) to generate teacher features from student features. Additionally, Multimodal Complementarity Separation modules (MCS-L and MCS-H) are introduced to separate complementary features at different levels. Comprehensive experimental results on four public benchmarks demonstrate that, compared with mainstream RGB-T semantic segmentation methods, our FGDNet-S achieves competitive segmentation performance with lower number of parameters and computational complexity.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103282"},"PeriodicalIF":14.7000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525003550","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, multimodal knowledge distillation-based methods for RGB-T semantic segmentation have been developed to enhance segmentation performance and inference speeds. Technically, the crux of these models lies in the feature imitative distillation-based strategies, where the student models imitate the working principles of the teacher models through loss functions. Unfortunately, due to the significant gaps in the representation capability between the student and teacher models, such feature imitative distillation-based strategies may not achieve the anticipatory knowledge transfer performance in an efficient way. In this paper, we propose a novel feature generative distillation strategy for efficient RGB-T semantic segmentation, embodied in the Feature Generative Distillation-based Network (FGDNet), which includes a teacher model (FGDNet-T) and a student model (FGDNet-S). This strategy bridges the gaps between multimodal feature extraction and complementary information excavation by using Conditional Variational Auto-Encoder (CVAE) to generate teacher features from student features. Additionally, Multimodal Complementarity Separation modules (MCS-L and MCS-H) are introduced to separate complementary features at different levels. Comprehensive experimental results on four public benchmarks demonstrate that, compared with mainstream RGB-T semantic segmentation methods, our FGDNet-S achieves competitive segmentation performance with lower number of parameters and computational complexity.

查看原文本刊更多论文

基于特征生成蒸馏策略的高效RGB-T语义分割

近年来，基于多模态知识提取的RGB-T语义分割方法得到了发展，以提高分割性能和推理速度。从技术上讲，这些模型的关键在于基于特征模仿提炼的策略，其中学生模型通过损失函数模仿教师模型的工作原理。不幸的是，由于学生模型和教师模型在表示能力上的显著差距，这种基于特征模仿的提炼策略可能无法有效地实现预期的知识转移性能。在本文中，我们提出了一种新的特征生成蒸馏策略，用于高效的RGB-T语义分割，具体体现在基于特征生成蒸馏的网络（FGDNet）中，该网络包括一个教师模型（FGDNet- t）和一个学生模型（FGDNet- s）。该策略通过使用条件变分自编码器（CVAE）从学生特征中生成教师特征，弥补了多模态特征提取和互补信息挖掘之间的差距。此外，引入了多模态互补分离模块（MCS-L和MCS-H）来分离不同层次的互补特征。在4个公开基准测试上的综合实验结果表明，与主流RGB-T语义分割方法相比，FGDNet-S以更少的参数数量和更低的计算复杂度实现了具有竞争力的分割性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.