动态预测变压器与时间相关回归的动作检测

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2025-04-14 DOI:10.1016/j.patcog.2025.111644

Matthew Korban , Peter Youngs , Scott T. Acton

{"title":"动态预测变压器与时间相关回归的动作检测","authors":"Matthew Korban , Peter Youngs , Scott T. Acton","doi":"10.1016/j.patcog.2025.111644","DOIUrl":null,"url":null,"abstract":"<div><div>This paper introduces a novel transformer network tailored to skeleton-based action detection in untrimmed long video streams. Our approach centers around three innovative mechanisms that collectively enhance the network’s temporal analysis capabilities. First, a new predictive attention mechanism incorporates future frame data into the sequence analysis during the training phase. This mechanism addresses the essential issue of the current action detection models: incomplete temporal modeling in long action sequences, particularly for boundary frames that lie outside the network’s immediate temporal receptive field, while maintaining computational efficiency. Second, we integrate a new adaptive weighted temporal attention system that dynamically evaluates the importance of each frame within an action sequence. In contrast to the existing approaches, the proposed weighting strategy is both adaptive and interpretable, making it highly effective in handling long sequences with numerous non-informative frames. Third, the network incorporates an advanced regression technique. This approach independently identifies the start and end frames based on their relevance to different frames. Unlike existing homogeneous regression methods, the proposed regression method is heterogeneous and based on various temporal relationships, including those in future frames in actions, making it more effective for action detection. Extensive experiments on prominent untrimmed skeleton-based action datasets, PKU-MMD, OAD, and the Charade dataset demonstrate the effectiveness of this network.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111644"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A dynamic predictive transformer with temporal relevance regression for action detection\",\"authors\":\"Matthew Korban , Peter Youngs , Scott T. Acton\",\"doi\":\"10.1016/j.patcog.2025.111644\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper introduces a novel transformer network tailored to skeleton-based action detection in untrimmed long video streams. Our approach centers around three innovative mechanisms that collectively enhance the network’s temporal analysis capabilities. First, a new predictive attention mechanism incorporates future frame data into the sequence analysis during the training phase. This mechanism addresses the essential issue of the current action detection models: incomplete temporal modeling in long action sequences, particularly for boundary frames that lie outside the network’s immediate temporal receptive field, while maintaining computational efficiency. Second, we integrate a new adaptive weighted temporal attention system that dynamically evaluates the importance of each frame within an action sequence. In contrast to the existing approaches, the proposed weighting strategy is both adaptive and interpretable, making it highly effective in handling long sequences with numerous non-informative frames. Third, the network incorporates an advanced regression technique. This approach independently identifies the start and end frames based on their relevance to different frames. Unlike existing homogeneous regression methods, the proposed regression method is heterogeneous and based on various temporal relationships, including those in future frames in actions, making it more effective for action detection. Extensive experiments on prominent untrimmed skeleton-based action datasets, PKU-MMD, OAD, and the Charade dataset demonstrate the effectiveness of this network.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"166 \",\"pages\":\"Article 111644\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325003048\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325003048","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

介绍了一种新的基于骨架的长视频流动作检测变压器网络。我们的方法围绕着三个创新机制，共同增强了网络的时间分析能力。首先，一种新的预测注意机制在训练阶段将未来帧数据纳入序列分析。该机制解决了当前动作检测模型的基本问题：在保持计算效率的同时，在长动作序列中不完整的时间建模，特别是对于位于网络即时时间接受野之外的边界帧。其次，我们整合了一种新的自适应加权时间注意系统，该系统动态评估动作序列中每个帧的重要性。与现有方法相比，所提出的加权策略具有自适应和可解释性，使其在处理具有大量非信息帧的长序列时非常有效。第三，该网络采用了先进的回归技术。这种方法根据开始帧和结束帧与不同帧的相关性独立地识别它们。与现有的齐次回归方法不同，本文提出的回归方法是异构的，并且基于各种时间关系，包括动作中未来帧的时间关系，使其更有效地用于动作检测。在未修剪的基于骨架的动作数据集、PKU-MMD、OAD和Charade数据集上进行的大量实验证明了该网络的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A dynamic predictive transformer with temporal relevance regression for action detection

This paper introduces a novel transformer network tailored to skeleton-based action detection in untrimmed long video streams. Our approach centers around three innovative mechanisms that collectively enhance the network’s temporal analysis capabilities. First, a new predictive attention mechanism incorporates future frame data into the sequence analysis during the training phase. This mechanism addresses the essential issue of the current action detection models: incomplete temporal modeling in long action sequences, particularly for boundary frames that lie outside the network’s immediate temporal receptive field, while maintaining computational efficiency. Second, we integrate a new adaptive weighted temporal attention system that dynamically evaluates the importance of each frame within an action sequence. In contrast to the existing approaches, the proposed weighting strategy is both adaptive and interpretable, making it highly effective in handling long sequences with numerous non-informative frames. Third, the network incorporates an advanced regression technique. This approach independently identifies the start and end frames based on their relevance to different frames. Unlike existing homogeneous regression methods, the proposed regression method is heterogeneous and based on various temporal relationships, including those in future frames in actions, making it more effective for action detection. Extensive experiments on prominent untrimmed skeleton-based action datasets, PKU-MMD, OAD, and the Charade dataset demonstrate the effectiveness of this network.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.