基于序列特征关系挖掘的多尺度差分变压器鲁棒动作识别

IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Zengzhao Chen, Fumei Ma, Hai Liu, Wenkai Huang, Tingting Liu
{"title":"基于序列特征关系挖掘的多尺度差分变压器鲁棒动作识别","authors":"Zengzhao Chen,&nbsp;Fumei Ma,&nbsp;Hai Liu,&nbsp;Wenkai Huang,&nbsp;Tingting Liu","doi":"10.1007/s10489-025-06861-z","DOIUrl":null,"url":null,"abstract":"<div><p>Skeleton-based action recognition, which analyzes joint coordinates and bone connections to classify human actions, is important in understanding and analyzing human dynamic behaviors. However, actions in complex scenes have a high degree of similarity and variability, with the dynamic changes in human skeletons and subtle temporal variations in particular posing significant challenges to the accuracy and robustness of action recognition systems. To mitigate these challenges, we propose a novel multiscale differencing transformer (MDT) with sequence feature relationship mining for robust action recognition. MDT effectively mines inter-frame timing information and feature distribution differences across multiple scales, enabling a deeper understanding of the nuances between actions. Specifically, we first propose multiscale differential self-attention to handle the need for understanding action changes across multiple time scales, improving the capacity of the model to effectively capture the global and local dynamic features of actions. Then, we introduce a sequence feature relationship mining module to address complex data patterns in scenes that may span multiple sequences, exhibiting both similar and distinct characteristics. By utilizing coarse- and fine-grained sequence information, this module empowers the model to recognize intricate data patterns. On the NTU RGB+D 60 dataset, the proposed MDT model outperforms the recent STAR-Transformer by 1.6% on the Cross-Subject (CS) setting and 1.1% on the Cross-View (CV) setting, demonstrating its consistent effectiveness across different evaluation protocols.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 13","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MDT: A multiscale differencing transformer with sequence feature relationship mining for robust action recognition\",\"authors\":\"Zengzhao Chen,&nbsp;Fumei Ma,&nbsp;Hai Liu,&nbsp;Wenkai Huang,&nbsp;Tingting Liu\",\"doi\":\"10.1007/s10489-025-06861-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Skeleton-based action recognition, which analyzes joint coordinates and bone connections to classify human actions, is important in understanding and analyzing human dynamic behaviors. However, actions in complex scenes have a high degree of similarity and variability, with the dynamic changes in human skeletons and subtle temporal variations in particular posing significant challenges to the accuracy and robustness of action recognition systems. To mitigate these challenges, we propose a novel multiscale differencing transformer (MDT) with sequence feature relationship mining for robust action recognition. MDT effectively mines inter-frame timing information and feature distribution differences across multiple scales, enabling a deeper understanding of the nuances between actions. Specifically, we first propose multiscale differential self-attention to handle the need for understanding action changes across multiple time scales, improving the capacity of the model to effectively capture the global and local dynamic features of actions. Then, we introduce a sequence feature relationship mining module to address complex data patterns in scenes that may span multiple sequences, exhibiting both similar and distinct characteristics. By utilizing coarse- and fine-grained sequence information, this module empowers the model to recognize intricate data patterns. On the NTU RGB+D 60 dataset, the proposed MDT model outperforms the recent STAR-Transformer by 1.6% on the Cross-Subject (CS) setting and 1.1% on the Cross-View (CV) setting, demonstrating its consistent effectiveness across different evaluation protocols.</p></div>\",\"PeriodicalId\":8041,\"journal\":{\"name\":\"Applied Intelligence\",\"volume\":\"55 13\",\"pages\":\"\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10489-025-06861-z\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06861-z","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

基于骨骼的动作识别,通过分析关节坐标和骨骼连接对人体动作进行分类,是理解和分析人体动态行为的重要手段。然而,复杂场景中的动作具有高度的相似性和可变性,尤其是人体骨骼的动态变化和微妙的时间变化,对动作识别系统的准确性和鲁棒性提出了重大挑战。为了缓解这些挑战,我们提出了一种新的多尺度差分变压器(MDT),该变压器采用序列特征关系挖掘进行鲁棒动作识别。MDT有效地挖掘帧间时间信息和跨多个尺度的特征分布差异,从而更深入地了解动作之间的细微差别。具体来说,我们首先提出了多尺度差分自注意来处理理解跨时间尺度的动作变化的需要,提高了模型有效捕捉动作的全局和局部动态特征的能力。然后,我们引入了一个序列特征关系挖掘模块来处理场景中的复杂数据模式,这些场景可能跨越多个序列,表现出相似和不同的特征。通过利用粗粒度和细粒度的序列信息,该模块使模型能够识别复杂的数据模式。在NTU RGB+D 60数据集上,提出的MDT模型在交叉主题(CS)设置上优于最近的STAR-Transformer 1.6%,在交叉视图(CV)设置上优于1.1%,表明其在不同评估协议中的一致性有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
MDT: A multiscale differencing transformer with sequence feature relationship mining for robust action recognition

Skeleton-based action recognition, which analyzes joint coordinates and bone connections to classify human actions, is important in understanding and analyzing human dynamic behaviors. However, actions in complex scenes have a high degree of similarity and variability, with the dynamic changes in human skeletons and subtle temporal variations in particular posing significant challenges to the accuracy and robustness of action recognition systems. To mitigate these challenges, we propose a novel multiscale differencing transformer (MDT) with sequence feature relationship mining for robust action recognition. MDT effectively mines inter-frame timing information and feature distribution differences across multiple scales, enabling a deeper understanding of the nuances between actions. Specifically, we first propose multiscale differential self-attention to handle the need for understanding action changes across multiple time scales, improving the capacity of the model to effectively capture the global and local dynamic features of actions. Then, we introduce a sequence feature relationship mining module to address complex data patterns in scenes that may span multiple sequences, exhibiting both similar and distinct characteristics. By utilizing coarse- and fine-grained sequence information, this module empowers the model to recognize intricate data patterns. On the NTU RGB+D 60 dataset, the proposed MDT model outperforms the recent STAR-Transformer by 1.6% on the Cross-Subject (CS) setting and 1.1% on the Cross-View (CV) setting, demonstrating its consistent effectiveness across different evaluation protocols.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Applied Intelligence
Applied Intelligence 工程技术-计算机:人工智能
CiteScore
6.60
自引率
20.80%
发文量
1361
审稿时长
5.9 months
期刊介绍: With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信