基于少量事件的动作识别

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2025-06-21 DOI:10.1016/j.neunet.2025.107750

Zanxi Ruan , Nan Pu , Jiangming Chen , Songqun Gao , Yanming Guo , Qiuyu Kong , Yuxiang Xie , Yingmei Wei

{"title":"基于少量事件的动作识别","authors":"Zanxi Ruan , Nan Pu , Jiangming Chen , Songqun Gao , Yanming Guo , Qiuyu Kong , Yuxiang Xie , Yingmei Wei","doi":"10.1016/j.neunet.2025.107750","DOIUrl":null,"url":null,"abstract":"<div><div>Despite the evident superiority of event cameras in practical vision applications (e.g., action recognition), owing to their distinctive sensing mechanism, existing event-based action recognition methods rely heavily on large-scale training data. However, the expensive cost of camera deployment and the requirement of data privacy protection make it challenging to collect substantial data in real-world scenarios. To address this limitation, we explore a novel yet practical task, Few-Shot Event-Based Action Recognition (FSEAR), which aims at leveraging a minimal number of intractable event action data for model training and accurately classifying unlabeled data into a specific category. Accordingly, we design a new framework for FSEAR, including a Noise-Aware Event Encoder (NAE) and a Distilled Prototypical Distance Fusion (DPDF). The former efficiently filters noise within the spatiotemporal domain while retaining vital information related to action timing. The latter conducts multi-scale measurements across geometric, directional, and distributional dimensions. These two modules benefit mutually and thus effectively exploit the potential characteristics of event data. Extensive experiments on four distinct event action recognition datasets have demonstrated the significant advantages of our model over other few-shot learning methods. Our code and models will be publicly released.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"191 ","pages":"Article 107750"},"PeriodicalIF":6.0000,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Few-shot event-based action recognition\",\"authors\":\"Zanxi Ruan , Nan Pu , Jiangming Chen , Songqun Gao , Yanming Guo , Qiuyu Kong , Yuxiang Xie , Yingmei Wei\",\"doi\":\"10.1016/j.neunet.2025.107750\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Despite the evident superiority of event cameras in practical vision applications (e.g., action recognition), owing to their distinctive sensing mechanism, existing event-based action recognition methods rely heavily on large-scale training data. However, the expensive cost of camera deployment and the requirement of data privacy protection make it challenging to collect substantial data in real-world scenarios. To address this limitation, we explore a novel yet practical task, Few-Shot Event-Based Action Recognition (FSEAR), which aims at leveraging a minimal number of intractable event action data for model training and accurately classifying unlabeled data into a specific category. Accordingly, we design a new framework for FSEAR, including a Noise-Aware Event Encoder (NAE) and a Distilled Prototypical Distance Fusion (DPDF). The former efficiently filters noise within the spatiotemporal domain while retaining vital information related to action timing. The latter conducts multi-scale measurements across geometric, directional, and distributional dimensions. These two modules benefit mutually and thus effectively exploit the potential characteristics of event data. Extensive experiments on four distinct event action recognition datasets have demonstrated the significant advantages of our model over other few-shot learning methods. Our code and models will be publicly released.</div></div>\",\"PeriodicalId\":49763,\"journal\":{\"name\":\"Neural Networks\",\"volume\":\"191 \",\"pages\":\"Article 107750\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2025-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0893608025006306\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025006306","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

尽管事件相机在实际视觉应用（如动作识别）中具有明显的优势，但由于其独特的感知机制，现有的基于事件的动作识别方法严重依赖于大规模的训练数据。然而，高昂的摄像机部署成本和数据隐私保护要求使得在现实场景中收集大量数据变得非常困难。为了解决这一限制，我们探索了一种新颖而实用的任务，即基于事件的动作识别（FSEAR），其目的是利用最少数量的难处理事件动作数据进行模型训练，并准确地将未标记的数据分类到特定的类别中。因此，我们设计了一个新的FSEAR框架，包括一个噪声感知事件编码器（NAE）和一个蒸馏原型距离融合（DPDF）。前者有效地过滤了时空域内的噪声，同时保留了与动作时间相关的重要信息。后者在几何、方向和分布维度上进行多尺度测量。这两个模块相辅相成，有效地挖掘了事件数据的潜在特征。在四种不同的事件动作识别数据集上进行的大量实验表明，我们的模型比其他少镜头学习方法具有显著的优势。我们的代码和模型将会公开发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Few-shot event-based action recognition

Despite the evident superiority of event cameras in practical vision applications (e.g., action recognition), owing to their distinctive sensing mechanism, existing event-based action recognition methods rely heavily on large-scale training data. However, the expensive cost of camera deployment and the requirement of data privacy protection make it challenging to collect substantial data in real-world scenarios. To address this limitation, we explore a novel yet practical task, Few-Shot Event-Based Action Recognition (FSEAR), which aims at leveraging a minimal number of intractable event action data for model training and accurately classifying unlabeled data into a specific category. Accordingly, we design a new framework for FSEAR, including a Noise-Aware Event Encoder (NAE) and a Distilled Prototypical Distance Fusion (DPDF). The former efficiently filters noise within the spatiotemporal domain while retaining vital information related to action timing. The latter conducts multi-scale measurements across geometric, directional, and distributional dimensions. These two modules benefit mutually and thus effectively exploit the potential characteristics of event data. Extensive experiments on four distinct event action recognition datasets have demonstrated the significant advantages of our model over other few-shot learning methods. Our code and models will be publicly released.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.