Intrinsic Imaging Model Enhanced Contrastive Face Representation Learning

Haomiao Sun, S. Shan, Hu Han
{"title":"Intrinsic Imaging Model Enhanced Contrastive Face Representation Learning","authors":"Haomiao Sun, S. Shan, Hu Han","doi":"10.1109/FG57933.2023.10042802","DOIUrl":null,"url":null,"abstract":"Humans can easily perceive numerous information from faces, only part of which has been achieved by a machine, thanks to the availability of large-scale face images with supervision signals of those specific tasks. More face perception tasks, like rare expression or attribute recognition, and genetic syndrome diagnosis, are not solved due to a critical shortage of supervised data. One possible way to solve these tasks is leveraging ubiquitous large-scale unsupervised face images and building a foundation face model via methods like contrastive learning (CL), which is, however, not aware of the intrinsic physics of the human face. In consideration of this shortcoming, this paper proposes to enhance contrastive face representation learning by the physical imaging model. Specifically, besides the CL-backbone network, we also design an auxiliary bypass pathway to constrain the CL-backbone to support the ability of accurately re-rendering the face with a differentiable physical imaging model after decomposing an input face image into intrinsic 3D imaging factors. With this design, the CL network is endowed the capacity of implicitly “knowing” the 3D of the face rather than the 2D pixels only. In experiments, we learn face representations from the CelebA and WebFace-42M datasets in unsupervised mode and evaluate the generalization capability of the representations with three different downstream tasks in case of limited supervised data. The experimental results clearly justify the effectiveness of the proposed method.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FG57933.2023.10042802","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Humans can easily perceive numerous information from faces, only part of which has been achieved by a machine, thanks to the availability of large-scale face images with supervision signals of those specific tasks. More face perception tasks, like rare expression or attribute recognition, and genetic syndrome diagnosis, are not solved due to a critical shortage of supervised data. One possible way to solve these tasks is leveraging ubiquitous large-scale unsupervised face images and building a foundation face model via methods like contrastive learning (CL), which is, however, not aware of the intrinsic physics of the human face. In consideration of this shortcoming, this paper proposes to enhance contrastive face representation learning by the physical imaging model. Specifically, besides the CL-backbone network, we also design an auxiliary bypass pathway to constrain the CL-backbone to support the ability of accurately re-rendering the face with a differentiable physical imaging model after decomposing an input face image into intrinsic 3D imaging factors. With this design, the CL network is endowed the capacity of implicitly “knowing” the 3D of the face rather than the 2D pixels only. In experiments, we learn face representations from the CelebA and WebFace-42M datasets in unsupervised mode and evaluate the generalization capability of the representations with three different downstream tasks in case of limited supervised data. The experimental results clearly justify the effectiveness of the proposed method.
内禀成像模型增强对比面部表征学习
人类可以很容易地从面部感知到大量信息,其中只有一部分是由机器实现的,这要归功于具有特定任务监督信号的大规模面部图像的可用性。更多的人脸感知任务,如罕见表情或属性识别,以及遗传综合征诊断,由于严重缺乏监督数据而无法解决。解决这些任务的一种可能方法是利用无处不在的大规模无监督人脸图像,并通过对比学习(CL)等方法建立基础人脸模型,然而,这种方法并不了解人脸的内在物理特性。针对这一不足,本文提出利用物理成像模型增强对比人脸表征学习。具体而言,除了cl -骨干网络外,我们还设计了一个辅助旁路通道来约束cl -骨干网络,以支持在将输入的人脸图像分解为固有的三维成像因子后,使用可微物理成像模型精确地重新渲染人脸的能力。通过这种设计,CL网络被赋予了隐式“知道”人脸三维的能力,而不仅仅是二维像素。在实验中,我们以无监督模式从CelebA和WebFace-42M数据集中学习人脸表征,并在有限的监督数据情况下,通过三个不同的下游任务评估表征的泛化能力。实验结果清楚地证明了该方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信