Towards efficient RGB-T semantic segmentation via feature generative distillation strategy

IF 14.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Shenlu Zhao , Jingyi Wang , Qiang Zhang , Jungong Han
{"title":"Towards efficient RGB-T semantic segmentation via feature generative distillation strategy","authors":"Shenlu Zhao ,&nbsp;Jingyi Wang ,&nbsp;Qiang Zhang ,&nbsp;Jungong Han","doi":"10.1016/j.inffus.2025.103282","DOIUrl":null,"url":null,"abstract":"<div><div>Recently, multimodal knowledge distillation-based methods for RGB-T semantic segmentation have been developed to enhance segmentation performance and inference speeds. Technically, the crux of these models lies in the feature imitative distillation-based strategies, where the student models imitate the working principles of the teacher models through loss functions. Unfortunately, due to the significant gaps in the representation capability between the student and teacher models, such feature imitative distillation-based strategies may not achieve the anticipatory knowledge transfer performance in an efficient way. In this paper, we propose a novel feature generative distillation strategy for efficient RGB-T semantic segmentation, embodied in the Feature Generative Distillation-based Network (FGDNet), which includes a teacher model (FGDNet-T) and a student model (FGDNet-S). This strategy bridges the gaps between multimodal feature extraction and complementary information excavation by using Conditional Variational Auto-Encoder (CVAE) to generate teacher features from student features. Additionally, Multimodal Complementarity Separation modules (MCS-L and MCS-H) are introduced to separate complementary features at different levels. Comprehensive experimental results on four public benchmarks demonstrate that, compared with mainstream RGB-T semantic segmentation methods, our FGDNet-S achieves competitive segmentation performance with lower number of parameters and computational complexity.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103282"},"PeriodicalIF":14.7000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525003550","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Recently, multimodal knowledge distillation-based methods for RGB-T semantic segmentation have been developed to enhance segmentation performance and inference speeds. Technically, the crux of these models lies in the feature imitative distillation-based strategies, where the student models imitate the working principles of the teacher models through loss functions. Unfortunately, due to the significant gaps in the representation capability between the student and teacher models, such feature imitative distillation-based strategies may not achieve the anticipatory knowledge transfer performance in an efficient way. In this paper, we propose a novel feature generative distillation strategy for efficient RGB-T semantic segmentation, embodied in the Feature Generative Distillation-based Network (FGDNet), which includes a teacher model (FGDNet-T) and a student model (FGDNet-S). This strategy bridges the gaps between multimodal feature extraction and complementary information excavation by using Conditional Variational Auto-Encoder (CVAE) to generate teacher features from student features. Additionally, Multimodal Complementarity Separation modules (MCS-L and MCS-H) are introduced to separate complementary features at different levels. Comprehensive experimental results on four public benchmarks demonstrate that, compared with mainstream RGB-T semantic segmentation methods, our FGDNet-S achieves competitive segmentation performance with lower number of parameters and computational complexity.
基于特征生成蒸馏策略的高效RGB-T语义分割
近年来,基于多模态知识提取的RGB-T语义分割方法得到了发展,以提高分割性能和推理速度。从技术上讲,这些模型的关键在于基于特征模仿提炼的策略,其中学生模型通过损失函数模仿教师模型的工作原理。不幸的是,由于学生模型和教师模型在表示能力上的显著差距,这种基于特征模仿的提炼策略可能无法有效地实现预期的知识转移性能。在本文中,我们提出了一种新的特征生成蒸馏策略,用于高效的RGB-T语义分割,具体体现在基于特征生成蒸馏的网络(FGDNet)中,该网络包括一个教师模型(FGDNet- t)和一个学生模型(FGDNet- s)。该策略通过使用条件变分自编码器(CVAE)从学生特征中生成教师特征,弥补了多模态特征提取和互补信息挖掘之间的差距。此外,引入了多模态互补分离模块(MCS-L和MCS-H)来分离不同层次的互补特征。在4个公开基准测试上的综合实验结果表明,与主流RGB-T语义分割方法相比,FGDNet-S以更少的参数数量和更低的计算复杂度实现了具有竞争力的分割性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Information Fusion
Information Fusion 工程技术-计算机:理论方法
CiteScore
33.20
自引率
4.30%
发文量
161
审稿时长
7.9 months
期刊介绍: Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信