Self-supervised random mask attention GAN in tackling pose-invariant face recognition

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2024-11-02 DOI:10.1016/j.patcog.2024.111112

Jiashu Liao , Tanaya Guha , Victor Sanchez

{"title":"Self-supervised random mask attention GAN in tackling pose-invariant face recognition","authors":"Jiashu Liao , Tanaya Guha , Victor Sanchez","doi":"10.1016/j.patcog.2024.111112","DOIUrl":null,"url":null,"abstract":"<div><div>Pose Invariant Face Recognition (PIFR) has significantly advanced with Generative Adversarial Networks (GANs), which rotate face images acquired at any angle to a frontal view for enhanced recognition. However, such frontalization methods typically need ground-truth frontal-view images, often collected under strict laboratory conditions, making it challenging and costly to acquire the necessary training data. Additionally, traditional self-supervised PIFR methods rely on external rendering models for training, further complicating the overall training process. To tackle these two issues, we propose a new framework called <em>Mask Rotate</em>. Our framework introduces a novel training approach that requires no paired ground truth data for the face image frontalization task. Moreover, it eliminates the need for an external rendering model during training. Specifically, our framework simplifies the face image frontalization task by transforming it into a face image completion task. During the inference or testing stage, it employs a reliable pre-trained rendering model to obtain a frontal-view face image, which may have several regions with missing texture due to pose variations and occlusion. Our framework then uses a novel self-supervised <em>Random Mask</em> Attention Generative Adversarial Network (RMAGAN) to fill in these missing regions by considering them as randomly masked regions. Furthermore, our proposed <em>Mask Rotate</em> framework uses a reliable post-processing model designed to improve the visual quality of the face images after frontalization. In comprehensive experiments, the <em>Mask Rotate</em> framework eliminates the requirement for complex computations during training and achieves strong results, both qualitative and quantitative, compared to the state-of-the-art.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111112"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S003132032400863X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Pose Invariant Face Recognition (PIFR) has significantly advanced with Generative Adversarial Networks (GANs), which rotate face images acquired at any angle to a frontal view for enhanced recognition. However, such frontalization methods typically need ground-truth frontal-view images, often collected under strict laboratory conditions, making it challenging and costly to acquire the necessary training data. Additionally, traditional self-supervised PIFR methods rely on external rendering models for training, further complicating the overall training process. To tackle these two issues, we propose a new framework called Mask Rotate. Our framework introduces a novel training approach that requires no paired ground truth data for the face image frontalization task. Moreover, it eliminates the need for an external rendering model during training. Specifically, our framework simplifies the face image frontalization task by transforming it into a face image completion task. During the inference or testing stage, it employs a reliable pre-trained rendering model to obtain a frontal-view face image, which may have several regions with missing texture due to pose variations and occlusion. Our framework then uses a novel self-supervised Random Mask Attention Generative Adversarial Network (RMAGAN) to fill in these missing regions by considering them as randomly masked regions. Furthermore, our proposed Mask Rotate framework uses a reliable post-processing model designed to improve the visual quality of the face images after frontalization. In comprehensive experiments, the Mask Rotate framework eliminates the requirement for complex computations during training and achieves strong results, both qualitative and quantitative, compared to the state-of-the-art.

查看原文本刊更多论文

解决姿态不变人脸识别问题的自监督随机掩码注意力 GAN

姿态不变人脸识别（PIFR）在生成对抗网络（GANs）的帮助下取得了长足的进步，该网络可将从任何角度获取的人脸图像旋转为正面视图，从而提高识别率。然而，这种正面化方法通常需要在严格的实验室条件下采集的地面真实正面视图图像，因此获取必要的训练数据具有挑战性且成本高昂。此外，传统的自监督 PIFR 方法依赖外部渲染模型进行训练，使整个训练过程更加复杂。为了解决这两个问题，我们提出了一个名为 Mask Rotate 的新框架。我们的框架引入了一种新颖的训练方法，无需为人脸图像正面化任务提供成对的地面真实数据。此外，它在训练过程中无需外部渲染模型。具体来说，我们的框架将人脸图像正面化任务转化为人脸图像完成任务，从而简化了人脸图像正面化任务。在推理或测试阶段，它采用可靠的预训练渲染模型来获取正面视角的人脸图像，由于姿势变化和遮挡，该图像可能会有几个纹理缺失的区域。然后，我们的框架使用新颖的自监督随机遮罩注意力生成对抗网络（RMAGAN），通过将这些区域视为随机遮罩区域来填补这些缺失区域。此外，我们提出的掩码旋转框架采用了可靠的后处理模型，旨在提高正面化后人脸图像的视觉质量。在综合实验中，面具旋转框架无需在训练过程中进行复杂的计算，在定性和定量方面都取得了优于最先进技术的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.