Class Discriminative Knowledge Distillation

IF 5.3 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Emerging Topics in Computational Intelligence Pub Date : 2025-01-29 DOI:10.1109/TETCI.2025.3529896

Shuoxi Zhang;Hanpeng Liu;Yuyi Wang;Kun He;Jun Lin;Yang Zeng

{"title":"Class Discriminative Knowledge Distillation","authors":"Shuoxi Zhang;Hanpeng Liu;Yuyi Wang;Kun He;Jun Lin;Yang Zeng","doi":"10.1109/TETCI.2025.3529896","DOIUrl":null,"url":null,"abstract":"Knowledge distillation aims to transfer knowledge from a large teacher model to a lightweight student model, enabling the student to achieve performance comparable to the teacher. Existing methods explore various strategies for distillation, including soft logits, intermediate features, and even class-aware logits. Class-aware distillation, in particular, treats the columns of logit matrices as class representations, capturing potential relationships among instances within a batch. However, we argue that representing class embeddings solely as column vectors may not fully capture their inherent properties. In this study, we revisit class-aware knowledge distillation and propose that effective transfer of class-level knowledge requires two regularization strategies: <italic>separability</i> and <italic>orthogonality</i>. Additionally, we introduce an asymmetric architecture design to further enhance the transfer of class-level knowledge. Together, these components form a new methodology, Class Discriminative Knowledge Distillation (CD-KD). Empirical results demonstrate that CD-KD significantly outperforms several state-of-the-art logit-based and feature-based methods across diverse visual classification tasks, highlighting its effectiveness and robustness.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 2","pages":"1340-1351"},"PeriodicalIF":5.3000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10857646/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Knowledge distillation aims to transfer knowledge from a large teacher model to a lightweight student model, enabling the student to achieve performance comparable to the teacher. Existing methods explore various strategies for distillation, including soft logits, intermediate features, and even class-aware logits. Class-aware distillation, in particular, treats the columns of logit matrices as class representations, capturing potential relationships among instances within a batch. However, we argue that representing class embeddings solely as column vectors may not fully capture their inherent properties. In this study, we revisit class-aware knowledge distillation and propose that effective transfer of class-level knowledge requires two regularization strategies: separability and orthogonality. Additionally, we introduce an asymmetric architecture design to further enhance the transfer of class-level knowledge. Together, these components form a new methodology, Class Discriminative Knowledge Distillation (CD-KD). Empirical results demonstrate that CD-KD significantly outperforms several state-of-the-art logit-based and feature-based methods across diverse visual classification tasks, highlighting its effectiveness and robustness.

查看原文本刊更多论文

类判别知识蒸馏

知识蒸馏的目的是将知识从一个庞大的教师模型转移到一个轻量级的学生模型，使学生达到与教师相当的表现。现有的方法探索了各种蒸馏策略，包括软逻辑、中间特征，甚至是类感知逻辑。特别是类感知蒸馏，它将logit矩阵的列视为类表示，捕获批处理中实例之间的潜在关系。然而，我们认为仅将类嵌入表示为列向量可能无法完全捕获其固有属性。在本研究中，我们重新审视了类感知知识的提炼，并提出类级知识的有效转移需要两种正则化策略：可分性和正交性。此外，我们引入了非对称架构设计，以进一步增强类级知识的转移。总之，这些组成部分形成了一种新的方法，类判别知识蒸馏（CD-KD）。实证结果表明，在不同的视觉分类任务中，CD-KD显著优于几种最先进的基于逻辑和基于特征的方法，突出了其有效性和鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Emerging Topics in Computational Intelligence Mathematics-Control and Optimization

CiteScore

10.30

自引率

7.50%

发文量

147

期刊介绍： The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys. TETCI is an electronics only publication. TETCI publishes six issues per year. Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.