Ruiqi Liu , Boyu Diao , Libo Huang , Zijia An , Hangda Liu , Zhulin An , Yongjun Xu
{"title":"Low-redundancy distillation for continual learning","authors":"Ruiqi Liu , Boyu Diao , Libo Huang , Zijia An , Hangda Liu , Zhulin An , Yongjun Xu","doi":"10.1016/j.patcog.2025.111712","DOIUrl":null,"url":null,"abstract":"<div><div>Continual learning (CL) aims to learn new tasks without erasing previous knowledge. However, current CL methods primarily emphasize improving accuracy while often neglecting training efficiency, which consequently restricts their practical application. Drawing inspiration from the brain’s contextual gating mechanism, which selectively filters neural information and continuously updates past memories, we propose Low-redundancy Distillation (LoRD), a novel CL method that enhances model performance while maintaining training efficiency. This is achieved by eliminating redundancy in three aspects of CL: student model redundancy, teacher model redundancy, and rehearsal sample redundancy. By compressing the learnable parameters of the student model and pruning the teacher model, LoRD facilitates the retention and optimization of prior knowledge, effectively decoupling task-specific knowledge without manually assigning isolated parameters for each task. Furthermore, we optimize the selection of rehearsal samples and refine rehearsal frequency to improve training efficiency. Through a meticulous design of distillation and rehearsal strategies, LoRD effectively balances training efficiency and model precision. Extensive experimentation across various benchmark datasets and environments demonstrates LoRD’s superiority, achieving the highest accuracy with the lowest training FLOPs.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111712"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325003723","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Continual learning (CL) aims to learn new tasks without erasing previous knowledge. However, current CL methods primarily emphasize improving accuracy while often neglecting training efficiency, which consequently restricts their practical application. Drawing inspiration from the brain’s contextual gating mechanism, which selectively filters neural information and continuously updates past memories, we propose Low-redundancy Distillation (LoRD), a novel CL method that enhances model performance while maintaining training efficiency. This is achieved by eliminating redundancy in three aspects of CL: student model redundancy, teacher model redundancy, and rehearsal sample redundancy. By compressing the learnable parameters of the student model and pruning the teacher model, LoRD facilitates the retention and optimization of prior knowledge, effectively decoupling task-specific knowledge without manually assigning isolated parameters for each task. Furthermore, we optimize the selection of rehearsal samples and refine rehearsal frequency to improve training efficiency. Through a meticulous design of distillation and rehearsal strategies, LoRD effectively balances training efficiency and model precision. Extensive experimentation across various benchmark datasets and environments demonstrates LoRD’s superiority, achieving the highest accuracy with the lowest training FLOPs.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.