Multi-scale spatial-temporal transformer for 3D human pose estimation

2021 5th International Conference on Vision, Image and Signal Processing (ICVISP) Pub Date : 2021-12-01 DOI:10.1109/ICVISP54630.2021.00051

Yongpeng Wu, Junna Gao

引用次数: 1

Abstract

Most existing video-based 3D human pose estimation methods focus on single-scale spatial and temporal feature extraction. However, many human motions are only related to local joints, which suggests that we need to pay attention to the local pose of the human body for 3D pose estimation. In this paper, we propose a novel multi-scale spatial-temporal transformer framework to tackle the problem of 3D human pose estimation. Our framework mainly consists of two separate modules: a multi-scale spatial transformer module and a multiscale temporal transformer module. The first module is designed to enhance the spatial dependencies by the joint-level and part-level spatial transformers. The goal for the second module is to capture the temporal correlation of human pose by the local part-level and global whole-level temporal transformer. Then we apply a weight fusion module to predict accurate 3D human pose of the center frame. Experimental results show that our method achieves excellent performance.

查看原文本刊更多论文

三维人体姿态估计的多尺度时空变换

现有的基于视频的三维人体姿态估计方法大多集中在单尺度时空特征提取上。然而，许多人体运动只与局部关节有关，这表明我们在进行三维姿态估计时需要注意人体的局部姿态。在本文中，我们提出了一种新的多尺度时空变换框架来解决三维人体姿态估计问题。我们的框架主要由两个独立的模块组成:一个多尺度空间变压器模块和一个多尺度时间变压器模块。第一个模块通过关节级和部件级空间变压器增强空间依赖性。第二个模块的目标是通过局部局部级和全局全局级时间转换器捕获人体姿态的时间相关性。然后应用权重融合模块准确预测中心框架的三维人体姿态。实验结果表明，该方法具有良好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 5th International Conference on Vision, Image and Signal Processing (ICVISP)

自引率

0.00%

发文量