Jian Zhou , Jingchao Yao , Nan Su , Jingchen Lu , Qingyang Yu , Yichi Zhang , Wenqiang Hu
{"title":"KMANet: A spatio-temporal enhancement network for micro-action recognition","authors":"Jian Zhou , Jingchao Yao , Nan Su , Jingchen Lu , Qingyang Yu , Yichi Zhang , Wenqiang Hu","doi":"10.1016/j.knosys.2025.114139","DOIUrl":null,"url":null,"abstract":"<div><div>Action recognition technology has gained widespread application due to its ability to capture and process fine-grained motion details. Recent research has increasingly focused on analyzing individual emotions and intentions, bringing greater attention to micro-action recognition (MAR), which involves subtle and low-intensity movements. However, MAR faces several challenges, such as subtle variations in motion amplitude and highly similar visual features. These factors limit the effectiveness of traditional action recognition methods in achieving high detection accuracy. To address these limitations, we drew inspiration from the MAR benchmark MANet and focused on temporal feature modeling and effectively discriminative regions of micro-actions. Accordingly, we propose a two-stage MAR framework with a collaborative mechanism, termed KMANet, which adopts a two-stage spatiotemporal feature enhancement strategy. Specifically, in the temporal dimension, we design a Key Frame Attention Mechanism (KFAM) to automatically focus on key-frame sequences of micro-actions and capture inter-frame dynamic relationships, thereby reducing the interference of non-essential frames. This approach effectively addresses the issue of insignificant motion amplitude changes. The integration of the Micro-Action Focus Module (MAFM) on top of KFAM serves to further enhance local spatial features and reinforce detailed representation in core motion regions. The integration of these two modules achieves a substantial improvement in recognition accuracy at a minor computational expense. Extensive experimentation on the MAR dataset MA-52 and BBSI demonstrates that, in comparison to state-of-the-art methods, KMANet fulfills the requirements of fine-grained scenario detection and attains superior recognition accuracy and performance in micro-action recognition tasks.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"327 ","pages":"Article 114139"},"PeriodicalIF":7.6000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125011803","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Action recognition technology has gained widespread application due to its ability to capture and process fine-grained motion details. Recent research has increasingly focused on analyzing individual emotions and intentions, bringing greater attention to micro-action recognition (MAR), which involves subtle and low-intensity movements. However, MAR faces several challenges, such as subtle variations in motion amplitude and highly similar visual features. These factors limit the effectiveness of traditional action recognition methods in achieving high detection accuracy. To address these limitations, we drew inspiration from the MAR benchmark MANet and focused on temporal feature modeling and effectively discriminative regions of micro-actions. Accordingly, we propose a two-stage MAR framework with a collaborative mechanism, termed KMANet, which adopts a two-stage spatiotemporal feature enhancement strategy. Specifically, in the temporal dimension, we design a Key Frame Attention Mechanism (KFAM) to automatically focus on key-frame sequences of micro-actions and capture inter-frame dynamic relationships, thereby reducing the interference of non-essential frames. This approach effectively addresses the issue of insignificant motion amplitude changes. The integration of the Micro-Action Focus Module (MAFM) on top of KFAM serves to further enhance local spatial features and reinforce detailed representation in core motion regions. The integration of these two modules achieves a substantial improvement in recognition accuracy at a minor computational expense. Extensive experimentation on the MAR dataset MA-52 and BBSI demonstrates that, in comparison to state-of-the-art methods, KMANet fulfills the requirements of fine-grained scenario detection and attains superior recognition accuracy and performance in micro-action recognition tasks.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.