Hai Yuan , Xia Yuan , Yanli Liu , Guanyu Xing , Jing Hu , Xi Wu , Zijun Zhou
{"title":"Adaptive mesh-aligned Gaussian Splatting for monocular human avatar reconstruction","authors":"Hai Yuan , Xia Yuan , Yanli Liu , Guanyu Xing , Jing Hu , Xi Wu , Zijun Zhou","doi":"10.1016/j.gmod.2025.101300","DOIUrl":null,"url":null,"abstract":"<div><div>Virtual human avatars are essential for applications such as gaming, augmented reality, and virtual production. However, existing methods struggle to achieve high fidelity reconstruction from monocular input while keeping hardware costs low. Many approaches rely on the SMPL body prior and apply vertex offsets to represent clothed avatars. Unfortunately, excessive offsets often cause misalignment and blurred contours, particularly around clothing wrinkles, silhouette boundaries, and facial regions. To address these limitations, we propose a dual branch framework for human avatar reconstruction from monocular video. A lightweight Vertex Align Net (VAN) predicts per-vertex normal direction offsets on the SMPL mesh to achieve coarse geometric alignment and guide Gaussian-based human avatar modeling. In parallel, we construct a high resolution facial Gaussian branch based on FLAME estimated parameters, with facial regions localized via pretrained detectors. The facial and body renderings are fused using a semantic mask to enhance facial clarity and ensure globally consistent avatar appearance. Experiments demonstrate that our method surpasses state of the art approaches in modeling animatable human avatars with fine grained fidelity.</div></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"141 ","pages":"Article 101300"},"PeriodicalIF":2.2000,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Graphical Models","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1524070325000475","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Virtual human avatars are essential for applications such as gaming, augmented reality, and virtual production. However, existing methods struggle to achieve high fidelity reconstruction from monocular input while keeping hardware costs low. Many approaches rely on the SMPL body prior and apply vertex offsets to represent clothed avatars. Unfortunately, excessive offsets often cause misalignment and blurred contours, particularly around clothing wrinkles, silhouette boundaries, and facial regions. To address these limitations, we propose a dual branch framework for human avatar reconstruction from monocular video. A lightweight Vertex Align Net (VAN) predicts per-vertex normal direction offsets on the SMPL mesh to achieve coarse geometric alignment and guide Gaussian-based human avatar modeling. In parallel, we construct a high resolution facial Gaussian branch based on FLAME estimated parameters, with facial regions localized via pretrained detectors. The facial and body renderings are fused using a semantic mask to enhance facial clarity and ensure globally consistent avatar appearance. Experiments demonstrate that our method surpasses state of the art approaches in modeling animatable human avatars with fine grained fidelity.
期刊介绍:
Graphical Models is recognized internationally as a highly rated, top tier journal and is focused on the creation, geometric processing, animation, and visualization of graphical models and on their applications in engineering, science, culture, and entertainment. GMOD provides its readers with thoroughly reviewed and carefully selected papers that disseminate exciting innovations, that teach rigorous theoretical foundations, that propose robust and efficient solutions, or that describe ambitious systems or applications in a variety of topics.
We invite papers in five categories: research (contributions of novel theoretical or practical approaches or solutions), survey (opinionated views of the state-of-the-art and challenges in a specific topic), system (the architecture and implementation details of an innovative architecture for a complete system that supports model/animation design, acquisition, analysis, visualization?), application (description of a novel application of know techniques and evaluation of its impact), or lecture (an elegant and inspiring perspective on previously published results that clarifies them and teaches them in a new way).
GMOD offers its authors an accelerated review, feedback from experts in the field, immediate online publication of accepted papers, no restriction on color and length (when justified by the content) in the online version, and a broad promotion of published papers. A prestigious group of editors selected from among the premier international researchers in their fields oversees the review process.