增强基于视频的人体网格恢复与注意力-曼巴协同作用

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2025-04-01 DOI:10.1016/j.eswa.2025.127415

Silong Sheng , Tianyou Zheng , Zhijie Ren , Shengjiang Zhang , Yang Zhang , Weiwei Fu

{"title":"增强基于视频的人体网格恢复与注意力-曼巴协同作用","authors":"Silong Sheng , Tianyou Zheng , Zhijie Ren , Shengjiang Zhang , Yang Zhang , Weiwei Fu","doi":"10.1016/j.eswa.2025.127415","DOIUrl":null,"url":null,"abstract":"<div><div>While significant progress has been made in recovering 3D human body mesh from a single image, the domain of 3D human motion recovery from video still offers ample room for improvement in accuracy and smoothness. Existing video-based approaches typically employ singular methods to extract spatiotemporal features from video data for estimating complex pose and shape parameters to reconstruct human body mesh. However, their generalized feature extraction methods and the limited representational capacity of parametric models often lead to discontinuous motion poses and inaccurate body shapes. To solve these issues, we decouple video data temporal and spatial features leveraging Mamba (a kind of state-space sequence model) and Attention Mechanism (a learnable model focusing on the relevance features of the input). Additionally, we propose a Multihead Adaptive Agent Attention (a new type of attention mechanism, referred to as MAAA) module, which is applied to the non-parametric mesh reconstruction method to improve the mesh recovery effect. The proposed method consists of two stages: (1) 3D human pose estimation from videos and (2) reconstruction of human body mesh based on 3D pose and RGB information from videos. Quantitative and qualitative experiments demonstrate that our approach surpasses previous methods in temporal continuity and spatial accuracy on public datasets, such as enhancing 4.4% in MPVPE and 9.7% in ACCEL on Human3.6M dataset.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"279 ","pages":"Article 127415"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing video-based human mesh recovery with Attention-Mamba synergy\",\"authors\":\"Silong Sheng , Tianyou Zheng , Zhijie Ren , Shengjiang Zhang , Yang Zhang , Weiwei Fu\",\"doi\":\"10.1016/j.eswa.2025.127415\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>While significant progress has been made in recovering 3D human body mesh from a single image, the domain of 3D human motion recovery from video still offers ample room for improvement in accuracy and smoothness. Existing video-based approaches typically employ singular methods to extract spatiotemporal features from video data for estimating complex pose and shape parameters to reconstruct human body mesh. However, their generalized feature extraction methods and the limited representational capacity of parametric models often lead to discontinuous motion poses and inaccurate body shapes. To solve these issues, we decouple video data temporal and spatial features leveraging Mamba (a kind of state-space sequence model) and Attention Mechanism (a learnable model focusing on the relevance features of the input). Additionally, we propose a Multihead Adaptive Agent Attention (a new type of attention mechanism, referred to as MAAA) module, which is applied to the non-parametric mesh reconstruction method to improve the mesh recovery effect. The proposed method consists of two stages: (1) 3D human pose estimation from videos and (2) reconstruction of human body mesh based on 3D pose and RGB information from videos. Quantitative and qualitative experiments demonstrate that our approach surpasses previous methods in temporal continuity and spatial accuracy on public datasets, such as enhancing 4.4% in MPVPE and 9.7% in ACCEL on Human3.6M dataset.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"279 \",\"pages\":\"Article 127415\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425010371\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425010371","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

虽然从单个图像中恢复3D人体网格已经取得了重大进展，但从视频中恢复3D人体运动的领域在精度和平滑度方面仍有很大的提高空间。现有的基于视频的方法通常采用奇异方法从视频数据中提取时空特征，用于估计复杂的姿态和形状参数，从而重建人体网格。然而，它们的广义特征提取方法和参数模型有限的表征能力往往导致运动姿态不连续和身体形状不准确。为了解决这些问题，我们利用曼巴（一种状态空间序列模型）和注意力机制（一种关注输入相关特征的可学习模型）来解耦视频数据的时间和空间特征。此外，我们提出了一个多头自适应代理注意（一种新的注意机制，简称MAAA）模块，将其应用于非参数网格重建方法，以提高网格恢复效果。该方法分为两个阶段：(1)基于视频的三维人体姿态估计和(2)基于视频的三维姿态和RGB信息重建人体网格。定量和定性实验表明，我们的方法在公共数据集上的时间连续性和空间精度优于以往的方法，例如在human360 m数据集上MPVPE和ACCEL的时间连续性和空间精度分别提高了4.4%和9.7%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhancing video-based human mesh recovery with Attention-Mamba synergy

While significant progress has been made in recovering 3D human body mesh from a single image, the domain of 3D human motion recovery from video still offers ample room for improvement in accuracy and smoothness. Existing video-based approaches typically employ singular methods to extract spatiotemporal features from video data for estimating complex pose and shape parameters to reconstruct human body mesh. However, their generalized feature extraction methods and the limited representational capacity of parametric models often lead to discontinuous motion poses and inaccurate body shapes. To solve these issues, we decouple video data temporal and spatial features leveraging Mamba (a kind of state-space sequence model) and Attention Mechanism (a learnable model focusing on the relevance features of the input). Additionally, we propose a Multihead Adaptive Agent Attention (a new type of attention mechanism, referred to as MAAA) module, which is applied to the non-parametric mesh reconstruction method to improve the mesh recovery effect. The proposed method consists of two stages: (1) 3D human pose estimation from videos and (2) reconstruction of human body mesh based on 3D pose and RGB information from videos. Quantitative and qualitative experiments demonstrate that our approach surpasses previous methods in temporal continuity and spatial accuracy on public datasets, such as enhancing 4.4% in MPVPE and 9.7% in ACCEL on Human3.6M dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.