Marco Cotogni, Fei Yang, Claudio Cusano, Andrew D. Bagdanov, Joost van de Weijer
{"title":"Exemplar-Free Continual Learning of Vision Transformers via Gated Class-Attention and Cascaded Feature Drift Compensation","authors":"Marco Cotogni, Fei Yang, Claudio Cusano, Andrew D. Bagdanov, Joost van de Weijer","doi":"10.1007/s11263-025-02374-x","DOIUrl":null,"url":null,"abstract":"<p>Vision transformers (ViTs) have achieved remarkable successes across a broad range of computer vision applications. As a consequence, there has been increasing interest in extending continual learning theory and techniques to ViT architectures. We propose a new method for exemplar-free class incremental training of ViTs. The main challenge of exemplar-free continual learning is maintaining plasticity of the learner without causing catastrophic forgetting of previously learned tasks. This is often achieved via exemplar replay which can help recalibrate previous task classifiers to the feature drift which occurs when learning new tasks. Exemplar replay, however, comes at the cost of retaining samples from previous tasks which for many applications may not be possible. To address the problem of continual ViT training, we first propose <i>gated class-attention</i> to minimize the drift in the final ViT transformer block. This mask-based gating is applied to class-attention mechanism of the last transformer block and strongly regulates the weights crucial for previous tasks. Importantly, gated class-attention does not require the task-ID during inference, which distinguishes it from other parameter isolation methods. Secondly, we propose a new method of <i>feature drift compensation</i> that accommodates feature drift in the backbone when learning new tasks. The combination of gated class-attention and cascaded feature drift compensation allows for plasticity towards new tasks while limiting forgetting of previous ones. Extensive experiments performed on CIFAR-100, Tiny-ImageNet and ImageNet100 demonstrate that our exemplar-free method obtains competitive results when compared to rehearsal based ViT methods.(Code:https://github.com/OcraM17/GCAB-CFDC)</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"21 1","pages":""},"PeriodicalIF":11.6000,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11263-025-02374-x","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Vision transformers (ViTs) have achieved remarkable successes across a broad range of computer vision applications. As a consequence, there has been increasing interest in extending continual learning theory and techniques to ViT architectures. We propose a new method for exemplar-free class incremental training of ViTs. The main challenge of exemplar-free continual learning is maintaining plasticity of the learner without causing catastrophic forgetting of previously learned tasks. This is often achieved via exemplar replay which can help recalibrate previous task classifiers to the feature drift which occurs when learning new tasks. Exemplar replay, however, comes at the cost of retaining samples from previous tasks which for many applications may not be possible. To address the problem of continual ViT training, we first propose gated class-attention to minimize the drift in the final ViT transformer block. This mask-based gating is applied to class-attention mechanism of the last transformer block and strongly regulates the weights crucial for previous tasks. Importantly, gated class-attention does not require the task-ID during inference, which distinguishes it from other parameter isolation methods. Secondly, we propose a new method of feature drift compensation that accommodates feature drift in the backbone when learning new tasks. The combination of gated class-attention and cascaded feature drift compensation allows for plasticity towards new tasks while limiting forgetting of previous ones. Extensive experiments performed on CIFAR-100, Tiny-ImageNet and ImageNet100 demonstrate that our exemplar-free method obtains competitive results when compared to rehearsal based ViT methods.(Code:https://github.com/OcraM17/GCAB-CFDC)
视觉转换器(ViT)在计算机视觉的广泛应用中取得了显著的成就。因此,人们对将持续学习理论和技术扩展到 ViT 架构的兴趣与日俱增。我们提出了一种对 ViT 进行无范例类增量训练的新方法。无范例持续学习的主要挑战在于保持学习者的可塑性,同时又不对之前学习的任务造成灾难性遗忘。这通常是通过范例重放来实现的,范例重放有助于根据学习新任务时出现的特征漂移重新校准以前的任务分类器。然而,示例重放的代价是保留以前任务的样本,而这在许多应用中都是不可能的。为了解决持续 ViT 训练的问题,我们首先提出了门控类关注,以最大限度地减少最终 ViT 变换块中的漂移。这种基于掩码的门控机制适用于最后一个变换器模块的类注意机制,并对之前任务的关键权重进行强力调节。重要的是,门控类注意力在推理过程中不需要任务 ID,这使其有别于其他参数隔离方法。其次,我们提出了一种新的特征漂移补偿方法,可以在学习新任务时适应骨干特征漂移。门控类注意力和级联特征漂移补偿相结合,在限制对以前任务遗忘的同时,还能对新任务进行可塑性学习。在 CIFAR-100、Tiny-ImageNet 和 ImageNet100 上进行的大量实验表明,与基于排练的 ViT 方法相比,我们的无范例方法取得了具有竞争力的结果。(Code:https://github.com/OcraM17/GCAB-CFDC)
期刊介绍:
The International Journal of Computer Vision (IJCV) serves as a platform for sharing new research findings in the rapidly growing field of computer vision. It publishes 12 issues annually and presents high-quality, original contributions to the science and engineering of computer vision. The journal encompasses various types of articles to cater to different research outputs.
Regular articles, which span up to 25 journal pages, focus on significant technical advancements that are of broad interest to the field. These articles showcase substantial progress in computer vision.
Short articles, limited to 10 pages, offer a swift publication path for novel research outcomes. They provide a quicker means for sharing new findings with the computer vision community.
Survey articles, comprising up to 30 pages, offer critical evaluations of the current state of the art in computer vision or offer tutorial presentations of relevant topics. These articles provide comprehensive and insightful overviews of specific subject areas.
In addition to technical articles, the journal also includes book reviews, position papers, and editorials by prominent scientific figures. These contributions serve to complement the technical content and provide valuable perspectives.
The journal encourages authors to include supplementary material online, such as images, video sequences, data sets, and software. This additional material enhances the understanding and reproducibility of the published research.
Overall, the International Journal of Computer Vision is a comprehensive publication that caters to researchers in this rapidly growing field. It covers a range of article types, offers additional online resources, and facilitates the dissemination of impactful research.