{"title":"Correlation-Based Knowledge Distillation in Exemplar-Free Class-Incremental Learning","authors":"Zijian Gao;Bo Liu;Kele Xu;Xinjun Mao;Huaimin Wang","doi":"10.1109/OJCS.2025.3546754","DOIUrl":null,"url":null,"abstract":"Class-incremental learning (CIL) aims to learn a family of classes incrementally with data available in order rather than training all data at once. One main drawback of CIL is that standard deep neural networks suffer from catastrophic forgetting (CF), especially when the model only has access to data from the current incremental step. Knowledge Distillation (KD) is a widely used technique that utilizes old models as the teacher model to alleviate CF. However, based on a case study, our investigation reveals that the vanilla KD is insufficient with a strict point-to-point restriction. Instead, a relaxed match between the teacher and student improves distillation performance and model stability. In this article, we propose a simple yet effective method to mitigate CF without any additional training costs or requiring any exemplars. Specifically, we apply the linear correlation between the features of the teacher and student to measure the distillation loss rather than vanilla point-to-point loss, which significantly improves the model stability. Then, we utilize label augmentation to improve feature generalization and save prototypes to alleviate classification bias further. The proposed method significantly outperforms state-of-the-art methods in the various settings of benchmarks, including CIFAR-100 and Tiny-ImageNet, demonstrating its effectiveness and robustness.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"6 ","pages":"449-459"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10908063","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10908063/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Class-incremental learning (CIL) aims to learn a family of classes incrementally with data available in order rather than training all data at once. One main drawback of CIL is that standard deep neural networks suffer from catastrophic forgetting (CF), especially when the model only has access to data from the current incremental step. Knowledge Distillation (KD) is a widely used technique that utilizes old models as the teacher model to alleviate CF. However, based on a case study, our investigation reveals that the vanilla KD is insufficient with a strict point-to-point restriction. Instead, a relaxed match between the teacher and student improves distillation performance and model stability. In this article, we propose a simple yet effective method to mitigate CF without any additional training costs or requiring any exemplars. Specifically, we apply the linear correlation between the features of the teacher and student to measure the distillation loss rather than vanilla point-to-point loss, which significantly improves the model stability. Then, we utilize label augmentation to improve feature generalization and save prototypes to alleviate classification bias further. The proposed method significantly outperforms state-of-the-art methods in the various settings of benchmarks, including CIFAR-100 and Tiny-ImageNet, demonstrating its effectiveness and robustness.