Yawen Cui;Jian Zhao;Zitong Yu;Rizhao Cai;Xun Wang;Lei Jin;Alex C. Kot;Li Liu;Xuelong Li
{"title":"CMoA:用于广义少射连续学习的对比混合适配器","authors":"Yawen Cui;Jian Zhao;Zitong Yu;Rizhao Cai;Xun Wang;Lei Jin;Alex C. Kot;Li Liu;Xuelong Li","doi":"10.1109/TMM.2025.3543038","DOIUrl":null,"url":null,"abstract":"The goal of Few-Shot Continual Learning (FSCL) is to incrementally learn novel tasks with limited labeled samples and preserve previous capabilities simultaneously. However, current FSCL works lack research on domain increment and domain generalization ability, which cannot cope with changes in the visual perception environment. In this paper, we set up a Generalized FSCL (GFSCL) protocol involving both class- and domain-incremental scenarios together with domain generalization assessment. Firstly, two benchmark datasets and protocols are newly arranged, and detailed baselines are provided for this unexplored configuration. Furthermore, we find that common continual learning methods have poor generalization ability on unseen domains and cannot better tackle catastrophic forgetting issue in cross-incremental tasks. Hence, we propose a rehearsal-free framework based on Vision Transformer (ViT) named Contrastive Mixture of Adapters (CMoA). It contains two non-conflicting parts: (1) By applying the fast-adaptation characteristic of adapter-embedded ViT, the mixture of Adapters (MoA) module is incorporated into ViT. For stability purpose, cosine similarity regularization and dynamic weighting are designed to make each adapter learn specific knowledge and concentrate on particular classes. (2) To further enhance domain generalization ability, we alleviate the intra-class variation by prototype-calibrated contrastive learning to improve domain-invariant representation learning. Finally, six evaluation indicators showing the overall performance and forgetting are compared by comprehensive experiments on two benchmark datasets to validate the efficacy of CMoA, and the results illustrate that CMoA can achieve comparative performance with rehearsal-based continual learning methods.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"5533-5547"},"PeriodicalIF":9.7000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CMoA: Contrastive Mixture of Adapters for Generalized Few-Shot Continual Learning\",\"authors\":\"Yawen Cui;Jian Zhao;Zitong Yu;Rizhao Cai;Xun Wang;Lei Jin;Alex C. Kot;Li Liu;Xuelong Li\",\"doi\":\"10.1109/TMM.2025.3543038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The goal of Few-Shot Continual Learning (FSCL) is to incrementally learn novel tasks with limited labeled samples and preserve previous capabilities simultaneously. However, current FSCL works lack research on domain increment and domain generalization ability, which cannot cope with changes in the visual perception environment. In this paper, we set up a Generalized FSCL (GFSCL) protocol involving both class- and domain-incremental scenarios together with domain generalization assessment. Firstly, two benchmark datasets and protocols are newly arranged, and detailed baselines are provided for this unexplored configuration. Furthermore, we find that common continual learning methods have poor generalization ability on unseen domains and cannot better tackle catastrophic forgetting issue in cross-incremental tasks. Hence, we propose a rehearsal-free framework based on Vision Transformer (ViT) named Contrastive Mixture of Adapters (CMoA). It contains two non-conflicting parts: (1) By applying the fast-adaptation characteristic of adapter-embedded ViT, the mixture of Adapters (MoA) module is incorporated into ViT. For stability purpose, cosine similarity regularization and dynamic weighting are designed to make each adapter learn specific knowledge and concentrate on particular classes. (2) To further enhance domain generalization ability, we alleviate the intra-class variation by prototype-calibrated contrastive learning to improve domain-invariant representation learning. Finally, six evaluation indicators showing the overall performance and forgetting are compared by comprehensive experiments on two benchmark datasets to validate the efficacy of CMoA, and the results illustrate that CMoA can achieve comparative performance with rehearsal-based continual learning methods.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"27 \",\"pages\":\"5533-5547\"},\"PeriodicalIF\":9.7000,\"publicationDate\":\"2025-02-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10891550/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10891550/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
CMoA: Contrastive Mixture of Adapters for Generalized Few-Shot Continual Learning
The goal of Few-Shot Continual Learning (FSCL) is to incrementally learn novel tasks with limited labeled samples and preserve previous capabilities simultaneously. However, current FSCL works lack research on domain increment and domain generalization ability, which cannot cope with changes in the visual perception environment. In this paper, we set up a Generalized FSCL (GFSCL) protocol involving both class- and domain-incremental scenarios together with domain generalization assessment. Firstly, two benchmark datasets and protocols are newly arranged, and detailed baselines are provided for this unexplored configuration. Furthermore, we find that common continual learning methods have poor generalization ability on unseen domains and cannot better tackle catastrophic forgetting issue in cross-incremental tasks. Hence, we propose a rehearsal-free framework based on Vision Transformer (ViT) named Contrastive Mixture of Adapters (CMoA). It contains two non-conflicting parts: (1) By applying the fast-adaptation characteristic of adapter-embedded ViT, the mixture of Adapters (MoA) module is incorporated into ViT. For stability purpose, cosine similarity regularization and dynamic weighting are designed to make each adapter learn specific knowledge and concentrate on particular classes. (2) To further enhance domain generalization ability, we alleviate the intra-class variation by prototype-calibrated contrastive learning to improve domain-invariant representation learning. Finally, six evaluation indicators showing the overall performance and forgetting are compared by comprehensive experiments on two benchmark datasets to validate the efficacy of CMoA, and the results illustrate that CMoA can achieve comparative performance with rehearsal-based continual learning methods.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.