用于视频中人体姿态估计的混合注意力自适应采样网络

IF 1.7 4区计算机科学 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Animation and Virtual Worlds Pub Date : 2024-08-20 DOI:10.1002/cav.2244

Qianyun Song, Hao Zhang, Yanan Liu, Shouzheng Sun, Dan Xu

{"title":"用于视频中人体姿态估计的混合注意力自适应采样网络","authors":"Qianyun Song, Hao Zhang, Yanan Liu, Shouzheng Sun, Dan Xu","doi":"10.1002/cav.2244","DOIUrl":null,"url":null,"abstract":"<p>Human pose estimation in videos often uses sampling strategies like sparse uniform sampling and keyframe selection. Sparse uniform sampling can miss spatial-temporal relationships, while keyframe selection using CNNs struggles to fully capture these relationships and is costly. Neither strategy ensures the reliability of pose data from single-frame estimators. To address these issues, this article proposes an efficient and effective hybrid attention adaptive sampling network. This network includes a dynamic attention module and a pose quality attention module, which comprehensively consider the dynamic information and the quality of pose data. Additionally, the network improves efficiency through compact uniform sampling and parallel mechanism of multi-head self-attention. Our network is compatible with various video-based pose estimation frameworks and demonstrates greater robustness in high degree of occlusion, motion blur, and illumination changes, achieving state-of-the-art performance on Sub-JHMDB dataset.</p>","PeriodicalId":50645,"journal":{"name":"Computer Animation and Virtual Worlds","volume":"35 4","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hybrid attention adaptive sampling network for human pose estimation in videos\",\"authors\":\"Qianyun Song, Hao Zhang, Yanan Liu, Shouzheng Sun, Dan Xu\",\"doi\":\"10.1002/cav.2244\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Human pose estimation in videos often uses sampling strategies like sparse uniform sampling and keyframe selection. Sparse uniform sampling can miss spatial-temporal relationships, while keyframe selection using CNNs struggles to fully capture these relationships and is costly. Neither strategy ensures the reliability of pose data from single-frame estimators. To address these issues, this article proposes an efficient and effective hybrid attention adaptive sampling network. This network includes a dynamic attention module and a pose quality attention module, which comprehensively consider the dynamic information and the quality of pose data. Additionally, the network improves efficiency through compact uniform sampling and parallel mechanism of multi-head self-attention. Our network is compatible with various video-based pose estimation frameworks and demonstrates greater robustness in high degree of occlusion, motion blur, and illumination changes, achieving state-of-the-art performance on Sub-JHMDB dataset.</p>\",\"PeriodicalId\":50645,\"journal\":{\"name\":\"Computer Animation and Virtual Worlds\",\"volume\":\"35 4\",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2024-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Animation and Virtual Worlds\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cav.2244\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Animation and Virtual Worlds","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cav.2244","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

视频中的人体姿态估计通常使用稀疏均匀采样和关键帧选择等采样策略。稀疏均匀采样会遗漏空间-时间关系，而使用 CNN 的关键帧选择则难以完全捕捉这些关系，而且成本高昂。这两种策略都无法确保来自单帧估计器的姿态数据的可靠性。为了解决这些问题，本文提出了一种高效的混合注意力自适应采样网络。该网络包括一个动态注意力模块和一个姿态质量注意力模块，全面考虑了姿态数据的动态信息和质量。此外，该网络还通过紧凑的均匀采样和多头自注意并行机制提高了效率。我们的网络兼容各种基于视频的姿态估计框架，在高度遮挡、运动模糊和光照变化的情况下表现出更强的鲁棒性，在 Sub-JHMDB 数据集上取得了最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Hybrid attention adaptive sampling network for human pose estimation in videos

Human pose estimation in videos often uses sampling strategies like sparse uniform sampling and keyframe selection. Sparse uniform sampling can miss spatial-temporal relationships, while keyframe selection using CNNs struggles to fully capture these relationships and is costly. Neither strategy ensures the reliability of pose data from single-frame estimators. To address these issues, this article proposes an efficient and effective hybrid attention adaptive sampling network. This network includes a dynamic attention module and a pose quality attention module, which comprehensively consider the dynamic information and the quality of pose data. Additionally, the network improves efficiency through compact uniform sampling and parallel mechanism of multi-head self-attention. Our network is compatible with various video-based pose estimation frameworks and demonstrates greater robustness in high degree of occlusion, motion blur, and illumination changes, achieving state-of-the-art performance on Sub-JHMDB dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Animation and Virtual Worlds 工程技术-计算机：软件工程

CiteScore

2.20

自引率

0.00%

发文量

审稿时长

6-12 weeks

期刊介绍： With the advent of very powerful PCs and high-end graphics cards, there has been an incredible development in Virtual Worlds, real-time computer animation and simulation, games. But at the same time, new and cheaper Virtual Reality devices have appeared allowing an interaction with these real-time Virtual Worlds and even with real worlds through Augmented Reality. Three-dimensional characters, especially Virtual Humans are now of an exceptional quality, which allows to use them in the movie industry. But this is only a beginning, as with the development of Artificial Intelligence and Agent technology, these characters will become more and more autonomous and even intelligent. They will inhabit the Virtual Worlds in a Virtual Life together with animals and plants.