STELA:时空增强学习与三维人体姿态估计的解剖图形转换器

IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Jian Son, Jiho Lee, Eunwoo Kim
{"title":"STELA:时空增强学习与三维人体姿态估计的解剖图形转换器","authors":"Jian Son,&nbsp;Jiho Lee,&nbsp;Eunwoo Kim","doi":"10.1016/j.cviu.2025.104381","DOIUrl":null,"url":null,"abstract":"<div><div>Transformers have led to remarkable performance improvements in 3D human pose estimation by capturing global dependencies between joints in spatial and temporal aspects. To leverage human body topology information, attempts have been made to incorporate graph representation within a transformer architecture. However, they neglect spatial–temporal anatomical knowledge inherent in the human body, without considering the implicit relationships of non-connected joints. Furthermore, they disregard the movement patterns between joint trajectories, concentrating on the trajectories of individual joints. In this paper, we propose Spatial–Temporal Enhanced Learning with an Anatomical graph transformer (STELA) to aggregate the spatial–temporal global relationships and intricate anatomical relationships between joints. It consists of Global Self-attention (GS) and Anatomical Graph-attention (AG) branches. GS learns long-range dependencies between all joints across entire frames. AG focuses on the anatomical relationships of the human body in the spatial–temporal aspect using skeleton and motion pattern graphs. Extensive experiments demonstrate that STELA outperforms state-of-the-art approaches with an average of 41% fewer parameters, reducing MPJPE by an average of 2.7 mm on Human3.6M and 1.5 mm on MPI-INF-3DHP.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"257 ","pages":"Article 104381"},"PeriodicalIF":4.3000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"STELA: Spatial–temporal enhanced learning with an anatomical graph transformer for 3D human pose estimation\",\"authors\":\"Jian Son,&nbsp;Jiho Lee,&nbsp;Eunwoo Kim\",\"doi\":\"10.1016/j.cviu.2025.104381\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Transformers have led to remarkable performance improvements in 3D human pose estimation by capturing global dependencies between joints in spatial and temporal aspects. To leverage human body topology information, attempts have been made to incorporate graph representation within a transformer architecture. However, they neglect spatial–temporal anatomical knowledge inherent in the human body, without considering the implicit relationships of non-connected joints. Furthermore, they disregard the movement patterns between joint trajectories, concentrating on the trajectories of individual joints. In this paper, we propose Spatial–Temporal Enhanced Learning with an Anatomical graph transformer (STELA) to aggregate the spatial–temporal global relationships and intricate anatomical relationships between joints. It consists of Global Self-attention (GS) and Anatomical Graph-attention (AG) branches. GS learns long-range dependencies between all joints across entire frames. AG focuses on the anatomical relationships of the human body in the spatial–temporal aspect using skeleton and motion pattern graphs. Extensive experiments demonstrate that STELA outperforms state-of-the-art approaches with an average of 41% fewer parameters, reducing MPJPE by an average of 2.7 mm on Human3.6M and 1.5 mm on MPI-INF-3DHP.</div></div>\",\"PeriodicalId\":50633,\"journal\":{\"name\":\"Computer Vision and Image Understanding\",\"volume\":\"257 \",\"pages\":\"Article 104381\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision and Image Understanding\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1077314225001043\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225001043","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

变形金刚通过捕获关节在空间和时间方面的全局依赖关系,在3D人体姿态估计方面取得了显着的性能改进。为了利用人体拓扑信息,已经尝试在变压器体系结构中合并图形表示。然而,他们忽视了人体固有的时空解剖学知识,没有考虑非连接关节的隐含关系。此外,他们忽略了关节轨迹之间的运动模式,专注于单个关节的轨迹。在本文中,我们提出了一种基于解剖图转换器(STELA)的时空增强学习方法来聚合关节之间的时空全局关系和复杂的解剖关系。它由全局自注意(GS)分支和解剖图注意(AG)分支组成。GS学习整个帧中所有关节之间的远程依赖关系。AG侧重于人体的解剖关系,在时空方面使用骨架和运动模式图。大量的实验表明,STELA优于最先进的方法,平均减少了41%的参数,在Human3.6M上平均减少了2.7 mm的MPJPE,在mpi - if - 3dhp上平均减少了1.5 mm。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
STELA: Spatial–temporal enhanced learning with an anatomical graph transformer for 3D human pose estimation
Transformers have led to remarkable performance improvements in 3D human pose estimation by capturing global dependencies between joints in spatial and temporal aspects. To leverage human body topology information, attempts have been made to incorporate graph representation within a transformer architecture. However, they neglect spatial–temporal anatomical knowledge inherent in the human body, without considering the implicit relationships of non-connected joints. Furthermore, they disregard the movement patterns between joint trajectories, concentrating on the trajectories of individual joints. In this paper, we propose Spatial–Temporal Enhanced Learning with an Anatomical graph transformer (STELA) to aggregate the spatial–temporal global relationships and intricate anatomical relationships between joints. It consists of Global Self-attention (GS) and Anatomical Graph-attention (AG) branches. GS learns long-range dependencies between all joints across entire frames. AG focuses on the anatomical relationships of the human body in the spatial–temporal aspect using skeleton and motion pattern graphs. Extensive experiments demonstrate that STELA outperforms state-of-the-art approaches with an average of 41% fewer parameters, reducing MPJPE by an average of 2.7 mm on Human3.6M and 1.5 mm on MPI-INF-3DHP.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computer Vision and Image Understanding
Computer Vision and Image Understanding 工程技术-工程:电子与电气
CiteScore
7.80
自引率
4.40%
发文量
112
审稿时长
79 days
期刊介绍: The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信