基于少镜头骨架的动作识别的双注意焦点网络

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2025-09-27 DOI:10.1016/j.knosys.2025.114549

Jie Liu , Chongben Tao , Zhongwei Shen , Cong Wu , Tianyang Xu , Xizhao Luo , Feng Cao , Zhen Gao , Zufeng Zhang , Sai Xu

{"title":"基于少镜头骨架的动作识别的双注意焦点网络","authors":"Jie Liu , Chongben Tao , Zhongwei Shen , Cong Wu , Tianyang Xu , Xizhao Luo , Feng Cao , Zhen Gao , Zufeng Zhang , Sai Xu","doi":"10.1016/j.knosys.2025.114549","DOIUrl":null,"url":null,"abstract":"<div><div>Few-shot action recognition is a challenging yet practically significant problem that involves developing a model capable of learning discriminative features from a small number of labeled samples to recognize new action categories. Current methods typically infer spatial relationships either within or across skeletons to learn action representations, but this often results in features with insufficient discriminability and ineffective attention to critical body parts. To address these limitations, we propose DAF-Net, a novel framework that employs focal attention to jointly model intra-skeleton and inter-skeleton relationships, enhancing discriminative feature learning in few-shot skeleton-based action recognition. Unlike traditional methods that focus solely on intra-skeleton dependencies or inter-skeleton structures, DAF-Net dynamically integrates both components via focal attention, enhancing key body part representation and refining features, particularly in data-scarce conditions. Furthermore, DAF-Net incorporates an enhanced prototype generation strategy, optimizing class prototype formation via cosine similarity weighting to further improve feature discriminability in multi-shot scenarios. In temporal matching, cosine similarity evaluates local feature similarity within skeleton sequences, capturing directional variations of specific joints over time. Extensive experiments on three benchmark datasets (NTU-T, NTU-S, and Kinetics-skeleton) confirm significant performance gains, validating the effectiveness of DAF-Net.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114549"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dual attention focus network for few-shot skeleton-based action recognition\",\"authors\":\"Jie Liu , Chongben Tao , Zhongwei Shen , Cong Wu , Tianyang Xu , Xizhao Luo , Feng Cao , Zhen Gao , Zufeng Zhang , Sai Xu\",\"doi\":\"10.1016/j.knosys.2025.114549\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Few-shot action recognition is a challenging yet practically significant problem that involves developing a model capable of learning discriminative features from a small number of labeled samples to recognize new action categories. Current methods typically infer spatial relationships either within or across skeletons to learn action representations, but this often results in features with insufficient discriminability and ineffective attention to critical body parts. To address these limitations, we propose DAF-Net, a novel framework that employs focal attention to jointly model intra-skeleton and inter-skeleton relationships, enhancing discriminative feature learning in few-shot skeleton-based action recognition. Unlike traditional methods that focus solely on intra-skeleton dependencies or inter-skeleton structures, DAF-Net dynamically integrates both components via focal attention, enhancing key body part representation and refining features, particularly in data-scarce conditions. Furthermore, DAF-Net incorporates an enhanced prototype generation strategy, optimizing class prototype formation via cosine similarity weighting to further improve feature discriminability in multi-shot scenarios. In temporal matching, cosine similarity evaluates local feature similarity within skeleton sequences, capturing directional variations of specific joints over time. Extensive experiments on three benchmark datasets (NTU-T, NTU-S, and Kinetics-skeleton) confirm significant performance gains, validating the effectiveness of DAF-Net.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"330 \",\"pages\":\"Article 114549\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125015886\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125015886","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

少量动作识别是一个具有挑战性但又具有实际意义的问题，它涉及到开发一个能够从少量标记样本中学习判别特征以识别新动作类别的模型。目前的方法通常通过推断骨骼内部或骨骼之间的空间关系来学习动作表征，但这通常会导致特征的可辨别性不足，并且对关键身体部位的关注无效。为了解决这些限制，我们提出了DAF-Net，这是一个新的框架，它使用焦点关注来共同建模骨架内和骨架间的关系，增强了基于少数镜头骨架的动作识别中的判别特征学习。与仅关注骨架内依赖关系或骨架间结构的传统方法不同，DAF-Net通过焦点关注动态集成这两个组件，增强关键身体部位的表示并精炼特征，特别是在数据稀缺的条件下。此外，DAF-Net还采用了一种增强的原型生成策略，通过余弦相似度加权优化类原型的形成，进一步提高了多镜头场景下的特征可辨别性。在时间匹配中，余弦相似性评估骨骼序列中的局部特征相似性，捕获特定关节随时间的方向变化。在三个基准数据集（NTU-T、NTU-S和Kinetics-skeleton）上进行的大量实验证实了显著的性能提升，验证了DAF-Net的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dual attention focus network for few-shot skeleton-based action recognition

Few-shot action recognition is a challenging yet practically significant problem that involves developing a model capable of learning discriminative features from a small number of labeled samples to recognize new action categories. Current methods typically infer spatial relationships either within or across skeletons to learn action representations, but this often results in features with insufficient discriminability and ineffective attention to critical body parts. To address these limitations, we propose DAF-Net, a novel framework that employs focal attention to jointly model intra-skeleton and inter-skeleton relationships, enhancing discriminative feature learning in few-shot skeleton-based action recognition. Unlike traditional methods that focus solely on intra-skeleton dependencies or inter-skeleton structures, DAF-Net dynamically integrates both components via focal attention, enhancing key body part representation and refining features, particularly in data-scarce conditions. Furthermore, DAF-Net incorporates an enhanced prototype generation strategy, optimizing class prototype formation via cosine similarity weighting to further improve feature discriminability in multi-shot scenarios. In temporal matching, cosine similarity evaluates local feature similarity within skeleton sequences, capturing directional variations of specific joints over time. Extensive experiments on three benchmark datasets (NTU-T, NTU-S, and Kinetics-skeleton) confirm significant performance gains, validating the effectiveness of DAF-Net.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.