Yuling Su;Xueliang Liu;Zhen Huang;Jun He;Richang Hong;Meng Wang
{"title":"Reallocating and Evolving General Knowledge for Few-Shot Learning","authors":"Yuling Su;Xueliang Liu;Zhen Huang;Jun He;Richang Hong;Meng Wang","doi":"10.1109/TCSVT.2024.3450861","DOIUrl":null,"url":null,"abstract":"Large-scale vision-language pre-trained models like CLIP are extensively employed in few-shot tasks due to their robust generalization capabilities. Existing methods usually incorporate additional techniques to acquire knowledge for new tasks building upon the general knowledge in CLIP. However, they do not realize that the task-related knowledge might be implicitly embedded within the general knowledge well-learned. In this paper, we propose a novel framework to reallocate and evolve the general knowledge for specific few-shot tasks (REGK), mimicking the human “Attention Allocation” cognition mechanism. With a learnable mask-tuning selection, REGK focuses on selecting the task-related parameters of CLIP while learning specific few-shot knowledge without altering CLIP underlying framework. Specifically, we initially observe that inheriting the strong knowledge representation capability in CLIP is more advantageous for few-shot learning than its task-solving ability. Subsequently, a two-stage tuning framework is introduced to reallocate and control the mask-tuning on different tasks. It allows model automatically mask-tuning on different few-shot tasks with selective sparsity training. In this way, we achieve reliable transfer of task-related knowledge and effective exploration of new knowledge from limited data to enhance few-shot learning. Extensive experiments validate the superiority and potentiality of our model.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"34 12","pages":"13518-13529"},"PeriodicalIF":8.3000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10654283/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Large-scale vision-language pre-trained models like CLIP are extensively employed in few-shot tasks due to their robust generalization capabilities. Existing methods usually incorporate additional techniques to acquire knowledge for new tasks building upon the general knowledge in CLIP. However, they do not realize that the task-related knowledge might be implicitly embedded within the general knowledge well-learned. In this paper, we propose a novel framework to reallocate and evolve the general knowledge for specific few-shot tasks (REGK), mimicking the human “Attention Allocation” cognition mechanism. With a learnable mask-tuning selection, REGK focuses on selecting the task-related parameters of CLIP while learning specific few-shot knowledge without altering CLIP underlying framework. Specifically, we initially observe that inheriting the strong knowledge representation capability in CLIP is more advantageous for few-shot learning than its task-solving ability. Subsequently, a two-stage tuning framework is introduced to reallocate and control the mask-tuning on different tasks. It allows model automatically mask-tuning on different few-shot tasks with selective sparsity training. In this way, we achieve reliable transfer of task-related knowledge and effective exploration of new knowledge from limited data to enhance few-shot learning. Extensive experiments validate the superiority and potentiality of our model.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.