{"title":"基于极端姿态、深度和表情变化的RGBD视频鲁棒实时3D人脸跟踪","authors":"Hai Xuan Pham, V. Pavlovic","doi":"10.1109/3DV.2016.54","DOIUrl":null,"url":null,"abstract":"We introduce a novel end-to-end real-time pose-robust 3D face tracking framework from RGBD videos, which is capable of tracking head pose and facial actions simultaneously in unconstrained environment without intervention or pre-calibration from a user. In particular, we emphasize tracking the head pose from profile to profile and improving tracking performance in challenging instances, where the tracked subject is at a considerably large distance from the camera and the quality of data deteriorates severely. To achieve these goals, the tracker is guided by an efficient multi-view 3D shape regressor, trained upon generic RGB datasets, which is able to predict model parameters despite large head rotations or tracking range. Specifically, the shape regressor is made aware of the head pose by inferring the possibility of particular facial landmarks being visible through a joint regression-classification local random forest framework, and piecewise linear regression models effectively map visibility features into shape parameters. In addition, the regressor is combined with a joint 2D+3D optimization that sparsely exploits depth information to further refine shape parameters to maintain tracking accuracy over time. The result is a robust on-line RGBD 3D face tracker that can model extreme head poses and facial expressions accurately in challenging scenes, which are demonstrated in our extensive experiments.","PeriodicalId":425304,"journal":{"name":"2016 Fourth International Conference on 3D Vision (3DV)","volume":"458 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Robust Real-Time 3D Face Tracking from RGBD Videos under Extreme Pose, Depth, and Expression Variation\",\"authors\":\"Hai Xuan Pham, V. Pavlovic\",\"doi\":\"10.1109/3DV.2016.54\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We introduce a novel end-to-end real-time pose-robust 3D face tracking framework from RGBD videos, which is capable of tracking head pose and facial actions simultaneously in unconstrained environment without intervention or pre-calibration from a user. In particular, we emphasize tracking the head pose from profile to profile and improving tracking performance in challenging instances, where the tracked subject is at a considerably large distance from the camera and the quality of data deteriorates severely. To achieve these goals, the tracker is guided by an efficient multi-view 3D shape regressor, trained upon generic RGB datasets, which is able to predict model parameters despite large head rotations or tracking range. Specifically, the shape regressor is made aware of the head pose by inferring the possibility of particular facial landmarks being visible through a joint regression-classification local random forest framework, and piecewise linear regression models effectively map visibility features into shape parameters. In addition, the regressor is combined with a joint 2D+3D optimization that sparsely exploits depth information to further refine shape parameters to maintain tracking accuracy over time. The result is a robust on-line RGBD 3D face tracker that can model extreme head poses and facial expressions accurately in challenging scenes, which are demonstrated in our extensive experiments.\",\"PeriodicalId\":425304,\"journal\":{\"name\":\"2016 Fourth International Conference on 3D Vision (3DV)\",\"volume\":\"458 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Fourth International Conference on 3D Vision (3DV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/3DV.2016.54\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Fourth International Conference on 3D Vision (3DV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/3DV.2016.54","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Robust Real-Time 3D Face Tracking from RGBD Videos under Extreme Pose, Depth, and Expression Variation
We introduce a novel end-to-end real-time pose-robust 3D face tracking framework from RGBD videos, which is capable of tracking head pose and facial actions simultaneously in unconstrained environment without intervention or pre-calibration from a user. In particular, we emphasize tracking the head pose from profile to profile and improving tracking performance in challenging instances, where the tracked subject is at a considerably large distance from the camera and the quality of data deteriorates severely. To achieve these goals, the tracker is guided by an efficient multi-view 3D shape regressor, trained upon generic RGB datasets, which is able to predict model parameters despite large head rotations or tracking range. Specifically, the shape regressor is made aware of the head pose by inferring the possibility of particular facial landmarks being visible through a joint regression-classification local random forest framework, and piecewise linear regression models effectively map visibility features into shape parameters. In addition, the regressor is combined with a joint 2D+3D optimization that sparsely exploits depth information to further refine shape parameters to maintain tracking accuracy over time. The result is a robust on-line RGBD 3D face tracker that can model extreme head poses and facial expressions accurately in challenging scenes, which are demonstrated in our extensive experiments.