Xinrong Hu, Kaifan Yang, Ruiqi Luo, Tao Peng, Junping Liu
{"title":"使用旋转周期一致性从野生图像中学习单眼人脸重建","authors":"Xinrong Hu, Kaifan Yang, Ruiqi Luo, Tao Peng, Junping Liu","doi":"10.1016/j.vrih.2022.08.014","DOIUrl":null,"url":null,"abstract":"<div><div>With the popularity of the digital human body, monocular three-dimensional (3D) face reconstruction is widely used in fields such as animation and face recognition. Although current methods trained using single-view image sets perform well in monocular 3D face reconstruction tasks, they tend to rely on the constraints of the a priori model or the appearance conditions of the input images, fundamentally because of the inability to propose an effective method to reduce the effects of two-dimensional (2D) ambiguity. To solve this problem, we developed an unsupervised training framework for monocular face 3D reconstruction using rotational cycle consistency. Specifically, to learn more accurate facial information, we first used an autoencoder to factor the input images and applied these factors to generate normalized frontal views. We then proceeded through a differentiable renderer to use rotational consistency to continuously perceive refinement. Our method provided implicit multi-view consistency constraints on the pose and depth information estimation of the input face, and the performance was accurate and robust in the presence of large variations in expression and pose. In the benchmark tests, our method performed more stably and realistically than other methods that used 3D face reconstruction in monocular 2D images.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 4","pages":"Pages 379-392"},"PeriodicalIF":0.0000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning monocular face reconstruction from in the wild images using rotation cycle consistency\",\"authors\":\"Xinrong Hu, Kaifan Yang, Ruiqi Luo, Tao Peng, Junping Liu\",\"doi\":\"10.1016/j.vrih.2022.08.014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the popularity of the digital human body, monocular three-dimensional (3D) face reconstruction is widely used in fields such as animation and face recognition. Although current methods trained using single-view image sets perform well in monocular 3D face reconstruction tasks, they tend to rely on the constraints of the a priori model or the appearance conditions of the input images, fundamentally because of the inability to propose an effective method to reduce the effects of two-dimensional (2D) ambiguity. To solve this problem, we developed an unsupervised training framework for monocular face 3D reconstruction using rotational cycle consistency. Specifically, to learn more accurate facial information, we first used an autoencoder to factor the input images and applied these factors to generate normalized frontal views. We then proceeded through a differentiable renderer to use rotational consistency to continuously perceive refinement. Our method provided implicit multi-view consistency constraints on the pose and depth information estimation of the input face, and the performance was accurate and robust in the presence of large variations in expression and pose. In the benchmark tests, our method performed more stably and realistically than other methods that used 3D face reconstruction in monocular 2D images.</div></div>\",\"PeriodicalId\":33538,\"journal\":{\"name\":\"Virtual Reality Intelligent Hardware\",\"volume\":\"7 4\",\"pages\":\"Pages 379-392\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Virtual Reality Intelligent Hardware\",\"FirstCategoryId\":\"1093\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2096579622000894\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Virtual Reality Intelligent Hardware","FirstCategoryId":"1093","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2096579622000894","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}
Learning monocular face reconstruction from in the wild images using rotation cycle consistency
With the popularity of the digital human body, monocular three-dimensional (3D) face reconstruction is widely used in fields such as animation and face recognition. Although current methods trained using single-view image sets perform well in monocular 3D face reconstruction tasks, they tend to rely on the constraints of the a priori model or the appearance conditions of the input images, fundamentally because of the inability to propose an effective method to reduce the effects of two-dimensional (2D) ambiguity. To solve this problem, we developed an unsupervised training framework for monocular face 3D reconstruction using rotational cycle consistency. Specifically, to learn more accurate facial information, we first used an autoencoder to factor the input images and applied these factors to generate normalized frontal views. We then proceeded through a differentiable renderer to use rotational consistency to continuously perceive refinement. Our method provided implicit multi-view consistency constraints on the pose and depth information estimation of the input face, and the performance was accurate and robust in the presence of large variations in expression and pose. In the benchmark tests, our method performed more stably and realistically than other methods that used 3D face reconstruction in monocular 2D images.