重新分配和发展常识，实现快速学习

IF 8.3 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-08-28 DOI:10.1109/TCSVT.2024.3450861

Yuling Su;Xueliang Liu;Zhen Huang;Jun He;Richang Hong;Meng Wang

{"title":"重新分配和发展常识，实现快速学习","authors":"Yuling Su;Xueliang Liu;Zhen Huang;Jun He;Richang Hong;Meng Wang","doi":"10.1109/TCSVT.2024.3450861","DOIUrl":null,"url":null,"abstract":"Large-scale vision-language pre-trained models like CLIP are extensively employed in few-shot tasks due to their robust generalization capabilities. Existing methods usually incorporate additional techniques to acquire knowledge for new tasks building upon the general knowledge in CLIP. However, they do not realize that the task-related knowledge might be implicitly embedded within the general knowledge well-learned. In this paper, we propose a novel framework to reallocate and evolve the general knowledge for specific few-shot tasks (REGK), mimicking the human “Attention Allocation” cognition mechanism. With a learnable mask-tuning selection, REGK focuses on selecting the task-related parameters of CLIP while learning specific few-shot knowledge without altering CLIP underlying framework. Specifically, we initially observe that inheriting the strong knowledge representation capability in CLIP is more advantageous for few-shot learning than its task-solving ability. Subsequently, a two-stage tuning framework is introduced to reallocate and control the mask-tuning on different tasks. It allows model automatically mask-tuning on different few-shot tasks with selective sparsity training. In this way, we achieve reliable transfer of task-related knowledge and effective exploration of new knowledge from limited data to enhance few-shot learning. Extensive experiments validate the superiority and potentiality of our model.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"34 12","pages":"13518-13529"},"PeriodicalIF":8.3000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reallocating and Evolving General Knowledge for Few-Shot Learning\",\"authors\":\"Yuling Su;Xueliang Liu;Zhen Huang;Jun He;Richang Hong;Meng Wang\",\"doi\":\"10.1109/TCSVT.2024.3450861\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large-scale vision-language pre-trained models like CLIP are extensively employed in few-shot tasks due to their robust generalization capabilities. Existing methods usually incorporate additional techniques to acquire knowledge for new tasks building upon the general knowledge in CLIP. However, they do not realize that the task-related knowledge might be implicitly embedded within the general knowledge well-learned. In this paper, we propose a novel framework to reallocate and evolve the general knowledge for specific few-shot tasks (REGK), mimicking the human “Attention Allocation” cognition mechanism. With a learnable mask-tuning selection, REGK focuses on selecting the task-related parameters of CLIP while learning specific few-shot knowledge without altering CLIP underlying framework. Specifically, we initially observe that inheriting the strong knowledge representation capability in CLIP is more advantageous for few-shot learning than its task-solving ability. Subsequently, a two-stage tuning framework is introduced to reallocate and control the mask-tuning on different tasks. It allows model automatically mask-tuning on different few-shot tasks with selective sparsity training. In this way, we achieve reliable transfer of task-related knowledge and effective exploration of new knowledge from limited data to enhance few-shot learning. Extensive experiments validate the superiority and potentiality of our model.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"34 12\",\"pages\":\"13518-13529\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2024-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10654283/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10654283/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

像CLIP这样的大规模视觉语言预训练模型由于其强大的泛化能力而被广泛应用于少镜头任务中。现有方法通常结合额外的技术来获取知识，以在CLIP的一般知识基础上建立新的任务。然而，他们没有意识到与任务相关的知识可能隐含地嵌入在良好学习的一般知识中。在本文中，我们提出了一个新的框架来重新分配和进化特定的少数任务（REGK）的一般知识，模仿人类的“注意分配”认知机制。通过可学习的掩码调优选择，REGK专注于选择CLIP的任务相关参数，同时在不改变CLIP底层框架的情况下学习特定的少数镜头知识。具体来说，我们初步观察到继承CLIP中强大的知识表示能力比其任务求解能力更有利于少镜头学习。随后，引入了一个两阶段调优框架，对不同任务的掩码调优进行重新分配和控制。它允许模型通过选择性稀疏性训练自动对不同的少量任务进行掩码调整。通过这种方式，我们实现了任务相关知识的可靠转移，并从有限的数据中有效地探索新知识，从而增强了few-shot学习。大量的实验验证了该模型的优越性和潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Reallocating and Evolving General Knowledge for Few-Shot Learning

Large-scale vision-language pre-trained models like CLIP are extensively employed in few-shot tasks due to their robust generalization capabilities. Existing methods usually incorporate additional techniques to acquire knowledge for new tasks building upon the general knowledge in CLIP. However, they do not realize that the task-related knowledge might be implicitly embedded within the general knowledge well-learned. In this paper, we propose a novel framework to reallocate and evolve the general knowledge for specific few-shot tasks (REGK), mimicking the human “Attention Allocation” cognition mechanism. With a learnable mask-tuning selection, REGK focuses on selecting the task-related parameters of CLIP while learning specific few-shot knowledge without altering CLIP underlying framework. Specifically, we initially observe that inheriting the strong knowledge representation capability in CLIP is more advantageous for few-shot learning than its task-solving ability. Subsequently, a two-stage tuning framework is introduced to reallocate and control the mask-tuning on different tasks. It allows model automatically mask-tuning on different few-shot tasks with selective sparsity training. In this way, we achieve reliable transfer of task-related knowledge and effective exploration of new knowledge from limited data to enhance few-shot learning. Extensive experiments validate the superiority and potentiality of our model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.