Efficient semantic segmentation via logit-guided feature distillation

IF 6.3 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2026-07-01 Epub Date: 2026-01-29 DOI:10.1016/j.neunet.2026.108663

Xuyi Yu , Shang Lou , Yinghai Zhao , Huipeng Zhang , Kuizhi Mei

{"title":"Efficient semantic segmentation via logit-guided feature distillation","authors":"Xuyi Yu , Shang Lou , Yinghai Zhao , Huipeng Zhang , Kuizhi Mei","doi":"10.1016/j.neunet.2026.108663","DOIUrl":null,"url":null,"abstract":"<div><div>Knowledge Distillation (KD) is a critical technique for model compression, facilitating the transfer of implicit knowledge from a teacher model to a more compact, deployable student model. KD can be generally divided into two categories: logit distillation and feature distillation. Feature distillation has been predominant in achieving state-of-the-art (SOTA) performance, but recent advances in logit distillation have begun to narrow the gap. We propose a Logit-guided Feature Distillation (LFD) framework that combines the strengths of both logit and feature distillation to enhance the efficacy of knowledge transfer, particularly leveraging the rich classification information inherent in logits for semantic segmentation tasks. Furthermore, it is observed that Deep Neural Networks (DNNs) only manifest task-relevant characteristics at sufficient depths, which may be a limiting factor in achieving higher accuracy. In this work, we introduce a collaborative distillation method that preemptively focuses on critical pixels and categories in the early stage. We employ logits from deep layers to generate fine-grained spatial masks that are directly conveyed to the feature distillation stage, thereby inducing spatial gradient disparities. Additionally, we generate class masks that dynamically modulate the weights of shallow auxiliary heads, ensuring that class-relevant features can be calibrated by the primary head. A novel shared auxiliary head distillation approach is also presented. Experiments on the Cityscapes, Pascal VOC, and CamVid datasets show that the proposed method achieves competitive performance while maintaining low memory usage. Our codes will be released in <span><span>https://github.com/fate2715/LFD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"199 ","pages":"Article 108663"},"PeriodicalIF":6.3000,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608026001255","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/1/29 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Knowledge Distillation (KD) is a critical technique for model compression, facilitating the transfer of implicit knowledge from a teacher model to a more compact, deployable student model. KD can be generally divided into two categories: logit distillation and feature distillation. Feature distillation has been predominant in achieving state-of-the-art (SOTA) performance, but recent advances in logit distillation have begun to narrow the gap. We propose a Logit-guided Feature Distillation (LFD) framework that combines the strengths of both logit and feature distillation to enhance the efficacy of knowledge transfer, particularly leveraging the rich classification information inherent in logits for semantic segmentation tasks. Furthermore, it is observed that Deep Neural Networks (DNNs) only manifest task-relevant characteristics at sufficient depths, which may be a limiting factor in achieving higher accuracy. In this work, we introduce a collaborative distillation method that preemptively focuses on critical pixels and categories in the early stage. We employ logits from deep layers to generate fine-grained spatial masks that are directly conveyed to the feature distillation stage, thereby inducing spatial gradient disparities. Additionally, we generate class masks that dynamically modulate the weights of shallow auxiliary heads, ensuring that class-relevant features can be calibrated by the primary head. A novel shared auxiliary head distillation approach is also presented. Experiments on the Cityscapes, Pascal VOC, and CamVid datasets show that the proposed method achieves competitive performance while maintaining low memory usage. Our codes will be released in https://github.com/fate2715/LFD.

查看原文本刊更多论文

基于对数引导特征蒸馏的高效语义分割。

知识蒸馏（Knowledge Distillation， KD）是模型压缩的一项关键技术，有助于将隐性知识从教师模型转移到更紧凑、可部署的学生模型。KD一般可分为两类：logit精馏和特征精馏。特征蒸馏在实现最先进（SOTA）性能方面占主导地位，但logit蒸馏的最新进展已经开始缩小差距。我们提出了一个logit引导的特征蒸馏（LFD）框架，该框架结合了logit和特征蒸馏的优点，以提高知识转移的效率，特别是利用logit中固有的丰富分类信息进行语义分割任务。此外，我们观察到深度神经网络（dnn）仅在足够深度下表现出与任务相关的特征，这可能是实现更高精度的限制因素。在这项工作中，我们引入了一种协作蒸馏方法，在早期阶段先发制人地关注关键像素和类别。我们使用来自深层的逻辑来生成细粒度的空间掩模，这些掩模直接传递到特征蒸馏阶段，从而产生空间梯度差异。此外，我们生成动态调节浅辅助头部权重的类掩码，确保主头部可以校准与类相关的特征。提出了一种新的共享辅助水头蒸馏方法。在cityscape、Pascal VOC和CamVid数据集上的实验表明，该方法在保持较低内存占用的同时取得了具有竞争力的性能。我们的代码将在https://github.com/fate2715/LFD上发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.