{"title":"SFace2:基于合成的人脸识别与 w 空间身份驱动采样","authors":"Fadi Boutros;Marco Huber;Anh Thi Luu;Patrick Siebke;Naser Damer","doi":"10.1109/TBIOM.2024.3371502","DOIUrl":null,"url":null,"abstract":"The use of synthetic data for training neural networks has recently received increased attention, especially in the area of face recognition. This was mainly motivated by the increase of privacy, ethical, and legal concerns of using privacy-sensitive authentic data to train face recognition models. Many authentic datasets such as MS-Celeb-1M or VGGFace2 that have been widely used to train state-of-the-art deep face recognition models are retracted and officially no longer maintained or provided by official sources as they often have been collected without explicit consent. Toward this end, we first propose a synthetic face generation approach, SFace which utilizes a class-conditional generative adversarial network to generate class-labeled synthetic face images. To evaluate the privacy aspect of using such synthetic data in face recognition development, we provide an extensive evaluation of the identity relation between the generated synthetic dataset and the original authentic dataset used to train the generative model. The investigation proved that the associated identity of the authentic dataset to the one with the same class label in the synthetic dataset is hardly possible, strengthening the possibility for privacy-aware face recognition training. We then propose three different learning strategies to train the face recognition model on our privacy-friendly dataset, SFace, and report the results on five authentic benchmarks, demonstrating its high potential. Noticing the relatively low (in comparison to authentic data) identity discrimination in SFace, we started by analysing the w-space of the class-conditional generator, finding identity information that is highly correlated to that in the embedding space. Based on this finding, we proposed an approach that performs the sampling in the w-space driven to generate data with higher identity discrimination, the SFace2. Our experiments showed the disentanglement of the latent w-space and the benefit of training face recognition models on the more identity-discriminated synthetic dataset SFace2.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 3","pages":"290-303"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SFace2: Synthetic-Based Face Recognition With w-Space Identity-Driven Sampling\",\"authors\":\"Fadi Boutros;Marco Huber;Anh Thi Luu;Patrick Siebke;Naser Damer\",\"doi\":\"10.1109/TBIOM.2024.3371502\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The use of synthetic data for training neural networks has recently received increased attention, especially in the area of face recognition. This was mainly motivated by the increase of privacy, ethical, and legal concerns of using privacy-sensitive authentic data to train face recognition models. Many authentic datasets such as MS-Celeb-1M or VGGFace2 that have been widely used to train state-of-the-art deep face recognition models are retracted and officially no longer maintained or provided by official sources as they often have been collected without explicit consent. Toward this end, we first propose a synthetic face generation approach, SFace which utilizes a class-conditional generative adversarial network to generate class-labeled synthetic face images. To evaluate the privacy aspect of using such synthetic data in face recognition development, we provide an extensive evaluation of the identity relation between the generated synthetic dataset and the original authentic dataset used to train the generative model. The investigation proved that the associated identity of the authentic dataset to the one with the same class label in the synthetic dataset is hardly possible, strengthening the possibility for privacy-aware face recognition training. We then propose three different learning strategies to train the face recognition model on our privacy-friendly dataset, SFace, and report the results on five authentic benchmarks, demonstrating its high potential. Noticing the relatively low (in comparison to authentic data) identity discrimination in SFace, we started by analysing the w-space of the class-conditional generator, finding identity information that is highly correlated to that in the embedding space. Based on this finding, we proposed an approach that performs the sampling in the w-space driven to generate data with higher identity discrimination, the SFace2. Our experiments showed the disentanglement of the latent w-space and the benefit of training face recognition models on the more identity-discriminated synthetic dataset SFace2.\",\"PeriodicalId\":73307,\"journal\":{\"name\":\"IEEE transactions on biometrics, behavior, and identity science\",\"volume\":\"6 3\",\"pages\":\"290-303\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on biometrics, behavior, and identity science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10454585/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on biometrics, behavior, and identity science","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10454585/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
使用合成数据训练神经网络最近受到越来越多的关注,尤其是在人脸识别领域。这主要是由于使用对隐私敏感的真实数据来训练人脸识别模型在隐私、伦理和法律方面引起了越来越多的关注。许多曾被广泛用于训练最先进的深度人脸识别模型的真实数据集(如 MS-Celeb-1M 或 VGGFace2)都已被收回,官方也不再维护或提供这些数据集,因为它们通常是在未经明确同意的情况下收集的。为此,我们首先提出了一种合成人脸生成方法 SFace,它利用类条件生成对抗网络生成有类标签的合成人脸图像。为了评估在人脸识别开发中使用这种合成数据的隐私性,我们对生成的合成数据集和用于训练生成模型的原始真实数据集之间的身份关系进行了广泛评估。调查证明,真实数据集与合成数据集中具有相同类标签的数据集之间几乎不可能存在关联身份,这加强了隐私感知人脸识别训练的可能性。然后,我们提出了三种不同的学习策略,在我们的隐私友好型数据集 SFace 上训练人脸识别模型,并报告了在五个真实基准上的结果,证明了它的巨大潜力。我们注意到 SFace 中的身份识别率相对较低(与真实数据相比),因此首先分析了类条件生成器的 w 空间,发现身份信息与嵌入空间中的身份信息高度相关。基于这一发现,我们提出了一种在 w 空间驱动下进行采样的方法,即 SFace2,以生成具有更高身份鉴别力的数据。我们的实验表明,潜在的 w 空间是不相关的,在身份辨别度更高的合成数据集 SFace2 上训练人脸识别模型是有益的。
SFace2: Synthetic-Based Face Recognition With w-Space Identity-Driven Sampling
The use of synthetic data for training neural networks has recently received increased attention, especially in the area of face recognition. This was mainly motivated by the increase of privacy, ethical, and legal concerns of using privacy-sensitive authentic data to train face recognition models. Many authentic datasets such as MS-Celeb-1M or VGGFace2 that have been widely used to train state-of-the-art deep face recognition models are retracted and officially no longer maintained or provided by official sources as they often have been collected without explicit consent. Toward this end, we first propose a synthetic face generation approach, SFace which utilizes a class-conditional generative adversarial network to generate class-labeled synthetic face images. To evaluate the privacy aspect of using such synthetic data in face recognition development, we provide an extensive evaluation of the identity relation between the generated synthetic dataset and the original authentic dataset used to train the generative model. The investigation proved that the associated identity of the authentic dataset to the one with the same class label in the synthetic dataset is hardly possible, strengthening the possibility for privacy-aware face recognition training. We then propose three different learning strategies to train the face recognition model on our privacy-friendly dataset, SFace, and report the results on five authentic benchmarks, demonstrating its high potential. Noticing the relatively low (in comparison to authentic data) identity discrimination in SFace, we started by analysing the w-space of the class-conditional generator, finding identity information that is highly correlated to that in the embedding space. Based on this finding, we proposed an approach that performs the sampling in the w-space driven to generate data with higher identity discrimination, the SFace2. Our experiments showed the disentanglement of the latent w-space and the benefit of training face recognition models on the more identity-discriminated synthetic dataset SFace2.