Silong Sheng , Tianyou Zheng , Zhijie Ren , Shengjiang Zhang , Yang Zhang , Weiwei Fu
{"title":"增强基于视频的人体网格恢复与注意力-曼巴协同作用","authors":"Silong Sheng , Tianyou Zheng , Zhijie Ren , Shengjiang Zhang , Yang Zhang , Weiwei Fu","doi":"10.1016/j.eswa.2025.127415","DOIUrl":null,"url":null,"abstract":"<div><div>While significant progress has been made in recovering 3D human body mesh from a single image, the domain of 3D human motion recovery from video still offers ample room for improvement in accuracy and smoothness. Existing video-based approaches typically employ singular methods to extract spatiotemporal features from video data for estimating complex pose and shape parameters to reconstruct human body mesh. However, their generalized feature extraction methods and the limited representational capacity of parametric models often lead to discontinuous motion poses and inaccurate body shapes. To solve these issues, we decouple video data temporal and spatial features leveraging Mamba (a kind of state-space sequence model) and Attention Mechanism (a learnable model focusing on the relevance features of the input). Additionally, we propose a Multihead Adaptive Agent Attention (a new type of attention mechanism, referred to as MAAA) module, which is applied to the non-parametric mesh reconstruction method to improve the mesh recovery effect. The proposed method consists of two stages: (1) 3D human pose estimation from videos and (2) reconstruction of human body mesh based on 3D pose and RGB information from videos. Quantitative and qualitative experiments demonstrate that our approach surpasses previous methods in temporal continuity and spatial accuracy on public datasets, such as enhancing 4.4% in MPVPE and 9.7% in ACCEL on Human3.6M dataset.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"279 ","pages":"Article 127415"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing video-based human mesh recovery with Attention-Mamba synergy\",\"authors\":\"Silong Sheng , Tianyou Zheng , Zhijie Ren , Shengjiang Zhang , Yang Zhang , Weiwei Fu\",\"doi\":\"10.1016/j.eswa.2025.127415\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>While significant progress has been made in recovering 3D human body mesh from a single image, the domain of 3D human motion recovery from video still offers ample room for improvement in accuracy and smoothness. Existing video-based approaches typically employ singular methods to extract spatiotemporal features from video data for estimating complex pose and shape parameters to reconstruct human body mesh. However, their generalized feature extraction methods and the limited representational capacity of parametric models often lead to discontinuous motion poses and inaccurate body shapes. To solve these issues, we decouple video data temporal and spatial features leveraging Mamba (a kind of state-space sequence model) and Attention Mechanism (a learnable model focusing on the relevance features of the input). Additionally, we propose a Multihead Adaptive Agent Attention (a new type of attention mechanism, referred to as MAAA) module, which is applied to the non-parametric mesh reconstruction method to improve the mesh recovery effect. The proposed method consists of two stages: (1) 3D human pose estimation from videos and (2) reconstruction of human body mesh based on 3D pose and RGB information from videos. Quantitative and qualitative experiments demonstrate that our approach surpasses previous methods in temporal continuity and spatial accuracy on public datasets, such as enhancing 4.4% in MPVPE and 9.7% in ACCEL on Human3.6M dataset.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"279 \",\"pages\":\"Article 127415\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425010371\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425010371","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Enhancing video-based human mesh recovery with Attention-Mamba synergy
While significant progress has been made in recovering 3D human body mesh from a single image, the domain of 3D human motion recovery from video still offers ample room for improvement in accuracy and smoothness. Existing video-based approaches typically employ singular methods to extract spatiotemporal features from video data for estimating complex pose and shape parameters to reconstruct human body mesh. However, their generalized feature extraction methods and the limited representational capacity of parametric models often lead to discontinuous motion poses and inaccurate body shapes. To solve these issues, we decouple video data temporal and spatial features leveraging Mamba (a kind of state-space sequence model) and Attention Mechanism (a learnable model focusing on the relevance features of the input). Additionally, we propose a Multihead Adaptive Agent Attention (a new type of attention mechanism, referred to as MAAA) module, which is applied to the non-parametric mesh reconstruction method to improve the mesh recovery effect. The proposed method consists of two stages: (1) 3D human pose estimation from videos and (2) reconstruction of human body mesh based on 3D pose and RGB information from videos. Quantitative and qualitative experiments demonstrate that our approach surpasses previous methods in temporal continuity and spatial accuracy on public datasets, such as enhancing 4.4% in MPVPE and 9.7% in ACCEL on Human3.6M dataset.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.