Dan Liu , Qing Xia , Fanrong Meng , Mao Ye , Jianwei Zhang
{"title":"Multi-level semantic-assisted prototype learning for Few-Shot Action Recognition","authors":"Dan Liu , Qing Xia , Fanrong Meng , Mao Ye , Jianwei Zhang","doi":"10.1016/j.neucom.2025.130022","DOIUrl":null,"url":null,"abstract":"<div><div>The Few-Shot Action Recognition (FSAR) task involves recognizing new categories with limited labeled data. The conventional fine-tuning-based adaptation approach is often prone to overfitting and lacks temporal modeling for video data. Moreover, the discrepancy in distribution between meta-training and meta-test sets can also lead to suboptimal performance in few-shot scenarios. This paper introduces a simple yet effective multi-level semantic-assisted prototype learning framework to tackle these challenges. Initially, we leverage CLIP to achieve multimodal adaptation learning and present a multi-level semantic-assisted learning module to enhance the prototypes of different action classes based on semantic information. Additionally, we integrate the lightweight adapters into the CLIP visual encoder to support parameter-efficient transfer learning and improve temporal modeling in videos. Especially, a bias compensation block is employed for feature rectification to mitigate the distribution bias in FSAR stemming from data scarcity. Extensive experiments conducted on five standard benchmark datasets demonstrate the effectiveness of the proposed method.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"636 ","pages":"Article 130022"},"PeriodicalIF":5.5000,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225006940","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The Few-Shot Action Recognition (FSAR) task involves recognizing new categories with limited labeled data. The conventional fine-tuning-based adaptation approach is often prone to overfitting and lacks temporal modeling for video data. Moreover, the discrepancy in distribution between meta-training and meta-test sets can also lead to suboptimal performance in few-shot scenarios. This paper introduces a simple yet effective multi-level semantic-assisted prototype learning framework to tackle these challenges. Initially, we leverage CLIP to achieve multimodal adaptation learning and present a multi-level semantic-assisted learning module to enhance the prototypes of different action classes based on semantic information. Additionally, we integrate the lightweight adapters into the CLIP visual encoder to support parameter-efficient transfer learning and improve temporal modeling in videos. Especially, a bias compensation block is employed for feature rectification to mitigate the distribution bias in FSAR stemming from data scarcity. Extensive experiments conducted on five standard benchmark datasets demonstrate the effectiveness of the proposed method.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.