Multi-level semantic-assisted prototype learning for Few-Shot Action Recognition

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-03-22 DOI:10.1016/j.neucom.2025.130022

Dan Liu , Qing Xia , Fanrong Meng , Mao Ye , Jianwei Zhang

{"title":"Multi-level semantic-assisted prototype learning for Few-Shot Action Recognition","authors":"Dan Liu , Qing Xia , Fanrong Meng , Mao Ye , Jianwei Zhang","doi":"10.1016/j.neucom.2025.130022","DOIUrl":null,"url":null,"abstract":"<div><div>The Few-Shot Action Recognition (FSAR) task involves recognizing new categories with limited labeled data. The conventional fine-tuning-based adaptation approach is often prone to overfitting and lacks temporal modeling for video data. Moreover, the discrepancy in distribution between meta-training and meta-test sets can also lead to suboptimal performance in few-shot scenarios. This paper introduces a simple yet effective multi-level semantic-assisted prototype learning framework to tackle these challenges. Initially, we leverage CLIP to achieve multimodal adaptation learning and present a multi-level semantic-assisted learning module to enhance the prototypes of different action classes based on semantic information. Additionally, we integrate the lightweight adapters into the CLIP visual encoder to support parameter-efficient transfer learning and improve temporal modeling in videos. Especially, a bias compensation block is employed for feature rectification to mitigate the distribution bias in FSAR stemming from data scarcity. Extensive experiments conducted on five standard benchmark datasets demonstrate the effectiveness of the proposed method.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"636 ","pages":"Article 130022"},"PeriodicalIF":5.5000,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225006940","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The Few-Shot Action Recognition (FSAR) task involves recognizing new categories with limited labeled data. The conventional fine-tuning-based adaptation approach is often prone to overfitting and lacks temporal modeling for video data. Moreover, the discrepancy in distribution between meta-training and meta-test sets can also lead to suboptimal performance in few-shot scenarios. This paper introduces a simple yet effective multi-level semantic-assisted prototype learning framework to tackle these challenges. Initially, we leverage CLIP to achieve multimodal adaptation learning and present a multi-level semantic-assisted learning module to enhance the prototypes of different action classes based on semantic information. Additionally, we integrate the lightweight adapters into the CLIP visual encoder to support parameter-efficient transfer learning and improve temporal modeling in videos. Especially, a bias compensation block is employed for feature rectification to mitigate the distribution bias in FSAR stemming from data scarcity. Extensive experiments conducted on five standard benchmark datasets demonstrate the effectiveness of the proposed method.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.