{"title":"Class Discriminative Knowledge Distillation","authors":"Shuoxi Zhang;Hanpeng Liu;Yuyi Wang;Kun He;Jun Lin;Yang Zeng","doi":"10.1109/TETCI.2025.3529896","DOIUrl":null,"url":null,"abstract":"Knowledge distillation aims to transfer knowledge from a large teacher model to a lightweight student model, enabling the student to achieve performance comparable to the teacher. Existing methods explore various strategies for distillation, including soft logits, intermediate features, and even class-aware logits. Class-aware distillation, in particular, treats the columns of logit matrices as class representations, capturing potential relationships among instances within a batch. However, we argue that representing class embeddings solely as column vectors may not fully capture their inherent properties. In this study, we revisit class-aware knowledge distillation and propose that effective transfer of class-level knowledge requires two regularization strategies: <italic>separability</i> and <italic>orthogonality</i>. Additionally, we introduce an asymmetric architecture design to further enhance the transfer of class-level knowledge. Together, these components form a new methodology, Class Discriminative Knowledge Distillation (CD-KD). Empirical results demonstrate that CD-KD significantly outperforms several state-of-the-art logit-based and feature-based methods across diverse visual classification tasks, highlighting its effectiveness and robustness.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 2","pages":"1340-1351"},"PeriodicalIF":5.3000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10857646/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Knowledge distillation aims to transfer knowledge from a large teacher model to a lightweight student model, enabling the student to achieve performance comparable to the teacher. Existing methods explore various strategies for distillation, including soft logits, intermediate features, and even class-aware logits. Class-aware distillation, in particular, treats the columns of logit matrices as class representations, capturing potential relationships among instances within a batch. However, we argue that representing class embeddings solely as column vectors may not fully capture their inherent properties. In this study, we revisit class-aware knowledge distillation and propose that effective transfer of class-level knowledge requires two regularization strategies: separability and orthogonality. Additionally, we introduce an asymmetric architecture design to further enhance the transfer of class-level knowledge. Together, these components form a new methodology, Class Discriminative Knowledge Distillation (CD-KD). Empirical results demonstrate that CD-KD significantly outperforms several state-of-the-art logit-based and feature-based methods across diverse visual classification tasks, highlighting its effectiveness and robustness.
期刊介绍:
The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys.
TETCI is an electronics only publication. TETCI publishes six issues per year.
Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.