快慢网络融合增强型细粒度特征进行动作识别

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation Pub Date : 2024-10-30 DOI:10.1016/j.jvcir.2024.104328

Xuegang Wu , Jiawei Zhu , Liu Yang

{"title":"快慢网络融合增强型细粒度特征进行动作识别","authors":"Xuegang Wu , Jiawei Zhu , Liu Yang","doi":"10.1016/j.jvcir.2024.104328","DOIUrl":null,"url":null,"abstract":"<div><div>Two-stream methods, which separate human actions and backgrounds into temporal and spatial streams visually, have shown promising results in action recognition datasets. However, prior researches emphasize motion modeling but overlook the robust correlation between motion features and spatial information, causing restriction of the model’s ability to recognize behaviors entailing occlusions or rapid changes. Therefore, we introduce Faster-slow, an improved framework for frame-level motion features. It introduces a Behavioural Feature Enhancement (BFE) module based on a novel two-stream network with different temporal resolutions. BFE consists of two components: MM, which incorporates motion-aware attention to capture dependencies between adjacent frames; STC, which enhances spatio-temporal and channel information to generate optimized features. Overall, BFE facilitates the extraction of finer-grained motion information, while ensuring a stable fusion of information across both streams. We evaluate the Faster-slow on the Atomic Visual Actions dataset, and the Faster-AVA dataset constructed in this paper, yielding promising experimental results.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104328"},"PeriodicalIF":2.6000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Faster-slow network fused with enhanced fine-grained features for action recognition\",\"authors\":\"Xuegang Wu , Jiawei Zhu , Liu Yang\",\"doi\":\"10.1016/j.jvcir.2024.104328\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Two-stream methods, which separate human actions and backgrounds into temporal and spatial streams visually, have shown promising results in action recognition datasets. However, prior researches emphasize motion modeling but overlook the robust correlation between motion features and spatial information, causing restriction of the model’s ability to recognize behaviors entailing occlusions or rapid changes. Therefore, we introduce Faster-slow, an improved framework for frame-level motion features. It introduces a Behavioural Feature Enhancement (BFE) module based on a novel two-stream network with different temporal resolutions. BFE consists of two components: MM, which incorporates motion-aware attention to capture dependencies between adjacent frames; STC, which enhances spatio-temporal and channel information to generate optimized features. Overall, BFE facilitates the extraction of finer-grained motion information, while ensuring a stable fusion of information across both streams. We evaluate the Faster-slow on the Atomic Visual Actions dataset, and the Faster-AVA dataset constructed in this paper, yielding promising experimental results.</div></div>\",\"PeriodicalId\":54755,\"journal\":{\"name\":\"Journal of Visual Communication and Image Representation\",\"volume\":\"105 \",\"pages\":\"Article 104328\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2024-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Visual Communication and Image Representation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1047320324002840\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320324002840","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

双流法将人的动作和背景以视觉方式分为时间流和空间流，这种方法在动作识别数据集中显示出良好的效果。然而，之前的研究强调运动建模，却忽视了运动特征与空间信息之间的强相关性，从而限制了模型识别包含遮挡或快速变化的行为的能力。因此，我们引入了帧级运动特征改进框架 Faster-slow。它基于具有不同时间分辨率的新型双流网络，引入了行为特征增强（BFE）模块。BFE 由两个部分组成：MM：结合运动感知注意力，捕捉相邻帧之间的依赖关系；STC：增强时空信息和信道信息，生成优化特征。总之，BFE 可帮助提取更精细的运动信息，同时确保两个数据流信息的稳定融合。我们在 Atomic Visual Actions 数据集和本文构建的 Faster-AVA 数据集上对 Faster-slow 进行了评估，取得了令人满意的实验结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Faster-slow network fused with enhanced fine-grained features for action recognition

Two-stream methods, which separate human actions and backgrounds into temporal and spatial streams visually, have shown promising results in action recognition datasets. However, prior researches emphasize motion modeling but overlook the robust correlation between motion features and spatial information, causing restriction of the model’s ability to recognize behaviors entailing occlusions or rapid changes. Therefore, we introduce Faster-slow, an improved framework for frame-level motion features. It introduces a Behavioural Feature Enhancement (BFE) module based on a novel two-stream network with different temporal resolutions. BFE consists of two components: MM, which incorporates motion-aware attention to capture dependencies between adjacent frames; STC, which enhances spatio-temporal and channel information to generate optimized features. Overall, BFE facilitates the extraction of finer-grained motion information, while ensuring a stable fusion of information across both streams. We evaluate the Faster-slow on the Atomic Visual Actions dataset, and the Faster-AVA dataset constructed in this paper, yielding promising experimental results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Visual Communication and Image Representation 工程技术-计算机：软件工程

CiteScore

5.40

自引率

11.50%

发文量

188

审稿时长

9.9 months

期刊介绍： The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.