Jianping Gou , Jiaye Lin , Lin Li , Weihua Ou , Baosheng Yu , Zhang Yi
{"title":"Intra-class progressive and adaptive self-distillation","authors":"Jianping Gou , Jiaye Lin , Lin Li , Weihua Ou , Baosheng Yu , Zhang Yi","doi":"10.1016/j.neunet.2025.107404","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, knowledge distillation (KD) has become widely used in compressing models, training compact and efficient students to reduce computational load and training time due to the increasing parameters in deep neural networks. To minimize training costs, self-distillation has been proposed, with methods like offline-KD and online-KD requiring pre-trained teachers and multiple networks. However, these self-distillation methods often overlook feature knowledge and category information. In this paper, we introduce Intra-class Progressive and Adaptive Self-Distillation (IPASD), which transfers knowledge from the front to the back in adjacent epochs. This method extracts class-typical features and promotes compactness within classes. By integrating feature-level and logits-level knowledge into strong teacher knowledge and using ground-truth labels as supervision signals, we adaptively optimize the model. We evaluated IPASD on CIFAR-10, CIFAR-100, Tiny ImageNet, Plant Village datasets, and ImageNet showing its superiority over state-of-the-art self-distillation methods in knowledge transfer and model compression. Our codes are available at: <span><span>https://github.com/JLinye/IPASD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"188 ","pages":"Article 107404"},"PeriodicalIF":6.0000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025002837","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, knowledge distillation (KD) has become widely used in compressing models, training compact and efficient students to reduce computational load and training time due to the increasing parameters in deep neural networks. To minimize training costs, self-distillation has been proposed, with methods like offline-KD and online-KD requiring pre-trained teachers and multiple networks. However, these self-distillation methods often overlook feature knowledge and category information. In this paper, we introduce Intra-class Progressive and Adaptive Self-Distillation (IPASD), which transfers knowledge from the front to the back in adjacent epochs. This method extracts class-typical features and promotes compactness within classes. By integrating feature-level and logits-level knowledge into strong teacher knowledge and using ground-truth labels as supervision signals, we adaptively optimize the model. We evaluated IPASD on CIFAR-10, CIFAR-100, Tiny ImageNet, Plant Village datasets, and ImageNet showing its superiority over state-of-the-art self-distillation methods in knowledge transfer and model compression. Our codes are available at: https://github.com/JLinye/IPASD.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.