Complementary Learning Subnetworks Towards Parameter-Efficient Class-Incremental Learning

IF 8.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-03-12 DOI:10.1109/TKDE.2025.3550809

Depeng Li;Zhigang Zeng;Wei Dai;Ponnuthurai Nagaratnam Suganthan

{"title":"Complementary Learning Subnetworks Towards Parameter-Efficient Class-Incremental Learning","authors":"Depeng Li;Zhigang Zeng;Wei Dai;Ponnuthurai Nagaratnam Suganthan","doi":"10.1109/TKDE.2025.3550809","DOIUrl":null,"url":null,"abstract":"In the scenario of class-incremental learning (CIL), deep neural networks have to adapt their model parameters to non-stationary data distributions, e.g., the emergence of new classes over time. To mitigate the catastrophic forgetting phenomenon, typical CIL methods either cumulatively store exemplars of old classes for retraining model parameters from scratch or progressively expand model size as new classes arrive, which, however, compromises their practical value due to little attention paid to <italic>parameter efficiency</i>. In this paper, we contribute a novel solution, effective control of the parameters of a well-trained model, by the synergy between two complementary learning subnetworks. Specifically, we integrate one plastic feature extractor and one analytical feed-forward classifier into a unified framework amenable to streaming data. In each CIL session, it achieves non-overwritten parameter updates in a cost-effective manner, neither revisiting old task data nor extending previously learned networks; Instead, it accommodates new tasks by attaching a tiny set of declarative parameters to its backbone, in which only one matrix per task or one vector per class is kept for knowledge retention. Experimental results on a variety of task sequences demonstrate that our method achieves competitive results against state-of-the-art CIL approaches, especially in accuracy gain, knowledge transfer, training efficiency, and task-order robustness. Furthermore, a graceful forgetting implementation on previously learned trivial tasks is empirically investigated to make its non-growing backbone (i.e., a model with limited network capacity) suffice to train on more incoming tasks.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3240-3252"},"PeriodicalIF":8.9000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10924453/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In the scenario of class-incremental learning (CIL), deep neural networks have to adapt their model parameters to non-stationary data distributions, e.g., the emergence of new classes over time. To mitigate the catastrophic forgetting phenomenon, typical CIL methods either cumulatively store exemplars of old classes for retraining model parameters from scratch or progressively expand model size as new classes arrive, which, however, compromises their practical value due to little attention paid to parameter efficiency. In this paper, we contribute a novel solution, effective control of the parameters of a well-trained model, by the synergy between two complementary learning subnetworks. Specifically, we integrate one plastic feature extractor and one analytical feed-forward classifier into a unified framework amenable to streaming data. In each CIL session, it achieves non-overwritten parameter updates in a cost-effective manner, neither revisiting old task data nor extending previously learned networks; Instead, it accommodates new tasks by attaching a tiny set of declarative parameters to its backbone, in which only one matrix per task or one vector per class is kept for knowledge retention. Experimental results on a variety of task sequences demonstrate that our method achieves competitive results against state-of-the-art CIL approaches, especially in accuracy gain, knowledge transfer, training efficiency, and task-order robustness. Furthermore, a graceful forgetting implementation on previously learned trivial tasks is empirically investigated to make its non-growing backbone (i.e., a model with limited network capacity) suffice to train on more incoming tasks.

查看原文本刊更多论文

面向参数高效类增量学习的互补学习子网

在类增量学习（CIL）的场景中，深度神经网络必须使其模型参数适应非平稳数据分布，例如，随着时间的推移，新类的出现。为了减轻灾难性遗忘现象，典型的CIL方法要么累积存储旧类的样本以从头开始重新训练模型参数，要么随着新类的到来逐步扩展模型大小，然而，由于很少关注参数效率，这损害了它们的实用价值。在本文中，我们提出了一种新的解决方案，即通过两个互补学习子网之间的协同作用来有效控制训练良好的模型的参数。具体来说，我们将一个塑性特征提取器和一个分析前馈分类器集成到一个适用于流数据的统一框架中。在每个CIL会话中，它以经济有效的方式实现非覆盖参数更新，既不重访旧任务数据，也不扩展先前学习的网络；相反，它通过将一小组声明性参数附加到其主干来容纳新任务，其中每个任务只保留一个矩阵或每个类只保留一个向量以保留知识。在各种任务序列上的实验结果表明，我们的方法与最先进的CIL方法相比取得了具有竞争力的结果，特别是在准确性增益、知识转移、训练效率和任务顺序鲁棒性方面。此外，对先前学习的琐碎任务的优雅遗忘实现进行了实证研究，以使其不增长的骨干（即具有有限网络容量的模型）足以训练更多的传入任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Knowledge and Data Engineering 工程技术-工程：电子与电气

CiteScore

11.70

自引率

3.40%

发文量

515

审稿时长

6 months

期刊介绍： The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.