Multi-scale spatial-temporal transformer for 3D human pose estimation

Yongpeng Wu, Junna Gao
{"title":"Multi-scale spatial-temporal transformer for 3D human pose estimation","authors":"Yongpeng Wu, Junna Gao","doi":"10.1109/ICVISP54630.2021.00051","DOIUrl":null,"url":null,"abstract":"Most existing video-based 3D human pose estimation methods focus on single-scale spatial and temporal feature extraction. However, many human motions are only related to local joints, which suggests that we need to pay attention to the local pose of the human body for 3D pose estimation. In this paper, we propose a novel multi-scale spatial-temporal transformer framework to tackle the problem of 3D human pose estimation. Our framework mainly consists of two separate modules: a multi-scale spatial transformer module and a multiscale temporal transformer module. The first module is designed to enhance the spatial dependencies by the joint-level and part-level spatial transformers. The goal for the second module is to capture the temporal correlation of human pose by the local part-level and global whole-level temporal transformer. Then we apply a weight fusion module to predict accurate 3D human pose of the center frame. Experimental results show that our method achieves excellent performance.","PeriodicalId":296789,"journal":{"name":"2021 5th International Conference on Vision, Image and Signal Processing (ICVISP)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 5th International Conference on Vision, Image and Signal Processing (ICVISP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICVISP54630.2021.00051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Most existing video-based 3D human pose estimation methods focus on single-scale spatial and temporal feature extraction. However, many human motions are only related to local joints, which suggests that we need to pay attention to the local pose of the human body for 3D pose estimation. In this paper, we propose a novel multi-scale spatial-temporal transformer framework to tackle the problem of 3D human pose estimation. Our framework mainly consists of two separate modules: a multi-scale spatial transformer module and a multiscale temporal transformer module. The first module is designed to enhance the spatial dependencies by the joint-level and part-level spatial transformers. The goal for the second module is to capture the temporal correlation of human pose by the local part-level and global whole-level temporal transformer. Then we apply a weight fusion module to predict accurate 3D human pose of the center frame. Experimental results show that our method achieves excellent performance.
三维人体姿态估计的多尺度时空变换
现有的基于视频的三维人体姿态估计方法大多集中在单尺度时空特征提取上。然而,许多人体运动只与局部关节有关,这表明我们在进行三维姿态估计时需要注意人体的局部姿态。在本文中,我们提出了一种新的多尺度时空变换框架来解决三维人体姿态估计问题。我们的框架主要由两个独立的模块组成:一个多尺度空间变压器模块和一个多尺度时间变压器模块。第一个模块通过关节级和部件级空间变压器增强空间依赖性。第二个模块的目标是通过局部局部级和全局全局级时间转换器捕获人体姿态的时间相关性。然后应用权重融合模块准确预测中心框架的三维人体姿态。实验结果表明,该方法具有良好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信