MFKD: Multi-dimensional feature alignment for knowledge distillation

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-03-22 DOI:10.1016/j.imavis.2025.105514

Zhen Guo , Pengzhou Zhang , Peng Liang

{"title":"MFKD: Multi-dimensional feature alignment for knowledge distillation","authors":"Zhen Guo , Pengzhou Zhang , Peng Liang","doi":"10.1016/j.imavis.2025.105514","DOIUrl":null,"url":null,"abstract":"<div><div>Knowledge distillation is a popular technique for compressing and transferring models in the field of deep learning. However, existing distillation methods often focus on optimizing a single dimension and overlook the importance of aligning and transforming knowledge across multiple dimensions, leading to suboptimal results. In this article, we introduce a novel approach called multi-dimensional feature alignment for knowledge distillation (MFKD) to address this limitation. The MFKD framework is built on the observation that knowledge from different dimensions can complement each other effectively. We extract knowledge from features in the spatcial, sample and channel dimensions separately. Our spatial-level part separates the foreground and background information, guiding the student to focus on crucial image regions by mimicking the teacher’s spatial and channel attention maps. Our sample-level part distills knowledge encoded in semantic correlations between sample activations by aligning the student’s activations to emulate the teacher’s clustering patterns using the Spearman correlation coefficient. Furthermore, our channel-level part encourages the student to learn standardized feature representations aligned with the teacher’s channel-wise interdependencies. Finally, we dynamically balance the loss factors of the different dimensions to optimize the overall performance of the distillation process. To validate the effectiveness of our methodology, we conduct experiments on benchmark datasets such as CIFAR-100, ImageNet and COCO. The experimental results demonstrate substantial performance improvements compared to baseline and recent state-of-the-art methods, confirming the efficacy of our MFKD framework. Furthermore, we provide a comprehensive analysis of the experimental results, offering deeper insight into the benefits and effectiveness of our approach. Through this analysis, we reinforce the significance of aligning and leveraging knowledge across multiple dimensions in knowledge distillation.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"157 ","pages":"Article 105514"},"PeriodicalIF":4.2000,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625001027","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Knowledge distillation is a popular technique for compressing and transferring models in the field of deep learning. However, existing distillation methods often focus on optimizing a single dimension and overlook the importance of aligning and transforming knowledge across multiple dimensions, leading to suboptimal results. In this article, we introduce a novel approach called multi-dimensional feature alignment for knowledge distillation (MFKD) to address this limitation. The MFKD framework is built on the observation that knowledge from different dimensions can complement each other effectively. We extract knowledge from features in the spatcial, sample and channel dimensions separately. Our spatial-level part separates the foreground and background information, guiding the student to focus on crucial image regions by mimicking the teacher’s spatial and channel attention maps. Our sample-level part distills knowledge encoded in semantic correlations between sample activations by aligning the student’s activations to emulate the teacher’s clustering patterns using the Spearman correlation coefficient. Furthermore, our channel-level part encourages the student to learn standardized feature representations aligned with the teacher’s channel-wise interdependencies. Finally, we dynamically balance the loss factors of the different dimensions to optimize the overall performance of the distillation process. To validate the effectiveness of our methodology, we conduct experiments on benchmark datasets such as CIFAR-100, ImageNet and COCO. The experimental results demonstrate substantial performance improvements compared to baseline and recent state-of-the-art methods, confirming the efficacy of our MFKD framework. Furthermore, we provide a comprehensive analysis of the experimental results, offering deeper insight into the benefits and effectiveness of our approach. Through this analysis, we reinforce the significance of aligning and leveraging knowledge across multiple dimensions in knowledge distillation.

查看原文本刊更多论文

知识蒸馏的多维特征对齐

知识蒸馏是深度学习领域中对模型进行压缩和转移的一种常用技术。然而，现有的精馏方法往往侧重于单个维度的优化，而忽略了跨多个维度对齐和转换知识的重要性，从而导致次优结果。在本文中，我们介绍了一种称为多维特征对齐知识蒸馏（MFKD）的新方法来解决这一限制。MFKD框架建立在观察到来自不同维度的知识可以有效地相互补充的基础上。我们分别从空间、样本和通道维度的特征中提取知识。我们的空间级部分将前景和背景信息分开，通过模仿教师的空间和通道注意图，引导学生关注关键的图像区域。我们的样本级部分通过使用Spearman相关系数对齐学生的激活来模拟教师的聚类模式，从而提取样本激活之间语义相关性编码的知识。此外，我们的渠道级部分鼓励学生学习与教师的渠道相互依赖关系一致的标准化特征表示。最后，动态平衡各维度的损失因子，优化精馏过程的整体性能。为了验证我们方法的有效性，我们在CIFAR-100、ImageNet和COCO等基准数据集上进行了实验。实验结果表明，与基线和最近最先进的方法相比，性能有了实质性的提高，证实了我们的MFKD框架的有效性。此外，我们提供了实验结果的综合分析，提供了更深入的了解我们的方法的好处和有效性。通过这一分析，我们强调了在知识蒸馏中跨多个维度对齐和利用知识的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.