RULe: Relocalization-Uniformization-Landmark Estimation Network for Real-Time Face Alignment in Degraded Conditions

Arnaud Dapogny, Gauthier Tallec, Jules Bonnard, Edouard Yvinec, Kévin Bailly
{"title":"RULe: Relocalization-Uniformization-Landmark Estimation Network for Real-Time Face Alignment in Degraded Conditions","authors":"Arnaud Dapogny, Gauthier Tallec, Jules Bonnard, Edouard Yvinec, Kévin Bailly","doi":"10.1109/FG57933.2023.10042577","DOIUrl":null,"url":null,"abstract":"Face alignment refers to the process of estimating the position of a number of salient landmarks on face images or videos, such as mouth and eye corners, nose tip, etc. With the availability of large annotated databases and the rise of deep learning-based methods, face alignment as a domain has matured to a point where it can be applied in more or less unconstrained conditions, e.g. non-frontal head poses, presence of heavy make-up or partial occlusions. However, when considering real-case alignment on videos with possibly low frame rates, we need to make sure that the algorithms are robust to jittering of the face bounding box localization, low-resolution of the face crops, possible bad environmental lighting, brightness, and presence of noise. To tackle these issues, we propose RULe, a three-staged Relocalization-Uniformization-Landmark Estimation network. In the first stage, an initial loosely localized bounding box gets refined to output a well centered face crop, thus reducing the variability of the images prior to passing them to the subsequent stage. Then, in the second stage, the face style is uniformized (using adversarial learning as well as perceptual losses) to correct low resolution or variations of brightness/contrast. Finally, the third stage outputs a precise landmark estimation given such enhanced face crop using a cascaded compact model trained using hint-based knowledge distillation. We show through a variety of experiments that RULe achieves real-time face alignment with state-of-the-art precision in heavily degraded conditions.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FG57933.2023.10042577","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Face alignment refers to the process of estimating the position of a number of salient landmarks on face images or videos, such as mouth and eye corners, nose tip, etc. With the availability of large annotated databases and the rise of deep learning-based methods, face alignment as a domain has matured to a point where it can be applied in more or less unconstrained conditions, e.g. non-frontal head poses, presence of heavy make-up or partial occlusions. However, when considering real-case alignment on videos with possibly low frame rates, we need to make sure that the algorithms are robust to jittering of the face bounding box localization, low-resolution of the face crops, possible bad environmental lighting, brightness, and presence of noise. To tackle these issues, we propose RULe, a three-staged Relocalization-Uniformization-Landmark Estimation network. In the first stage, an initial loosely localized bounding box gets refined to output a well centered face crop, thus reducing the variability of the images prior to passing them to the subsequent stage. Then, in the second stage, the face style is uniformized (using adversarial learning as well as perceptual losses) to correct low resolution or variations of brightness/contrast. Finally, the third stage outputs a precise landmark estimation given such enhanced face crop using a cascaded compact model trained using hint-based knowledge distillation. We show through a variety of experiments that RULe achieves real-time face alignment with state-of-the-art precision in heavily degraded conditions.
规则:用于退化条件下实时人脸对齐的重新定位-均匀化-地标估计网络
人脸对齐是指对人脸图像或视频中若干显著标志(如嘴角、眼角、鼻尖等)的位置进行估计的过程。随着大型注释数据库的可用性和基于深度学习的方法的兴起,面部对齐作为一个领域已经成熟到可以应用于或多或少不受约束的条件,例如,非正面头部姿势,浓妆或部分遮挡的存在。然而,当考虑可能具有低帧率的视频的实际情况对齐时,我们需要确保算法对面部边界盒定位的抖动,低分辨率的面部裁剪,可能的恶劣环境照明,亮度和噪声的存在具有鲁棒性。为了解决这些问题,我们提出了RULe,一个三阶段的重新定位-统一化-地标估计网络。在第一阶段,将初始松散定位的边界框细化为输出居中的人脸裁剪,从而在将图像传递到后续阶段之前减少图像的可变性。然后,在第二阶段,统一面部风格(使用对抗性学习和感知损失)以纠正低分辨率或亮度/对比度的变化。最后,第三阶段使用基于提示的知识蒸馏训练的级联紧凑模型,在给定这种增强的人脸裁剪的情况下输出精确的地标估计。我们通过各种实验表明,RULe在严重退化的条件下以最先进的精度实现实时人脸对齐。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信