{"title":"Real-time Face Video Swapping From A Single Portrait","authors":"Luming Ma, Z. Deng","doi":"10.1145/3384382.3384519","DOIUrl":null,"url":null,"abstract":"We present a novel high-fidelity real-time method to replace the face in a target video clip by the face from a single source portrait image. Specifically, we first reconstruct the illumination, albedo, camera parameters, and wrinkle-level geometric details from both the source image and the target video. Then, the albedo of the source face is modified by a novel harmonization method to match the target face. Finally, the source face is re-rendered and blended into the target video using the lighting and camera parameters from the target video. Our method runs fully automatically and at real-time rate on any target face captured by cameras or from legacy video. More importantly, unlike existing deep learning based methods, our method does not need to pre-train any models, i.e., pre-collecting a large image/video dataset of the source or target face for model training is not needed. We demonstrate that a high level of video-realism can be achieved by our method on a variety of human faces with different identities, ethnicities, skin colors, and expressions.","PeriodicalId":91160,"journal":{"name":"Proceedings. ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3384382.3384519","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
We present a novel high-fidelity real-time method to replace the face in a target video clip by the face from a single source portrait image. Specifically, we first reconstruct the illumination, albedo, camera parameters, and wrinkle-level geometric details from both the source image and the target video. Then, the albedo of the source face is modified by a novel harmonization method to match the target face. Finally, the source face is re-rendered and blended into the target video using the lighting and camera parameters from the target video. Our method runs fully automatically and at real-time rate on any target face captured by cameras or from legacy video. More importantly, unlike existing deep learning based methods, our method does not need to pre-train any models, i.e., pre-collecting a large image/video dataset of the source or target face for model training is not needed. We demonstrate that a high level of video-realism can be achieved by our method on a variety of human faces with different identities, ethnicities, skin colors, and expressions.