文本引导下的屏蔽人脸图像合成

IF 5.2 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Multimedia Computing Communications and Applications Pub Date : 2024-03-30 DOI:10.1145/3654667

Anjali T, Masilamani V

{"title":"文本引导下的屏蔽人脸图像合成","authors":"Anjali T, Masilamani V","doi":"10.1145/3654667","DOIUrl":null,"url":null,"abstract":"<p>The COVID-19 pandemic has made us all understand that wearing a face mask protects us from the spread of respiratory viruses. The face authentication systems, which are trained on the basis of facial key points such as the eyes, nose, and mouth, found it difficult to identify the person when the majority of the face is covered by the face mask. Removing the mask for authentication will cause the infection to spread. The possible solutions are: (a) to train the face recognition systems to identify the person with the upper face features (b) Reconstruct the complete face of the person with a generative model. (c) train the model with a dataset of the masked faces of the people. In this paper, we explore the scope of generative models for image synthesis. We used stable diffusion to generate masked face images of popular celebrities on various text prompts. A realistic dataset of 15K masked face images of 100 celebrities is generated and is called the Realistic Synthetic Masked Face Dataset (RSMFD). The model and the generated dataset will be made public so that researchers can augment the dataset. According to our knowledge, this is the largest masked face recognition dataset with realistic images. The generated images were tested on popular deep face recognition models and achieved significant results. The dataset is also trained and tested on some of the famous image classification models, and the results are competitive. The dataset is available on this link:- https://drive.google.com/drive/folders/1yetcgUOL1TOP4rod1geGsOkIrIJHtcEw?usp=sharing\n</p>","PeriodicalId":50937,"journal":{"name":"ACM Transactions on Multimedia Computing Communications and Applications","volume":"1 1","pages":""},"PeriodicalIF":5.2000,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Text-Guided Synthesis of Masked Face Images\",\"authors\":\"Anjali T, Masilamani V\",\"doi\":\"10.1145/3654667\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The COVID-19 pandemic has made us all understand that wearing a face mask protects us from the spread of respiratory viruses. The face authentication systems, which are trained on the basis of facial key points such as the eyes, nose, and mouth, found it difficult to identify the person when the majority of the face is covered by the face mask. Removing the mask for authentication will cause the infection to spread. The possible solutions are: (a) to train the face recognition systems to identify the person with the upper face features (b) Reconstruct the complete face of the person with a generative model. (c) train the model with a dataset of the masked faces of the people. In this paper, we explore the scope of generative models for image synthesis. We used stable diffusion to generate masked face images of popular celebrities on various text prompts. A realistic dataset of 15K masked face images of 100 celebrities is generated and is called the Realistic Synthetic Masked Face Dataset (RSMFD). The model and the generated dataset will be made public so that researchers can augment the dataset. According to our knowledge, this is the largest masked face recognition dataset with realistic images. The generated images were tested on popular deep face recognition models and achieved significant results. The dataset is also trained and tested on some of the famous image classification models, and the results are competitive. The dataset is available on this link:- https://drive.google.com/drive/folders/1yetcgUOL1TOP4rod1geGsOkIrIJHtcEw?usp=sharing\\n</p>\",\"PeriodicalId\":50937,\"journal\":{\"name\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2024-03-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Multimedia Computing Communications and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3654667\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Multimedia Computing Communications and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3654667","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

COVID-19 大流行让我们明白，戴口罩可以防止呼吸道病毒传播。人脸识别系统是根据眼睛、鼻子和嘴巴等面部关键部位进行训练的，但当面部大部分被口罩遮住时，系统就很难识别人的身份。摘下面具进行身份验证会导致感染扩散。可能的解决办法有(a) 对人脸识别系统进行训练，使其能够根据人脸上部特征识别人的身份 (b) 利用生成模型重建人的完整面部。(c) 使用蒙面人脸数据集训练模型。在本文中，我们探索了生成模型在图像合成中的应用范围。我们利用稳定扩散生成了各种文本提示下的流行名人的面具人脸图像。生成的真实数据集包含 100 位名人的 15K 张面具人脸图像，被称为真实合成面具人脸数据集（RSMFD）。模型和生成的数据集将公开，以便研究人员扩充数据集。据我们所知，这是最大的具有真实图像的面具人脸识别数据集。生成的图像在流行的深度人脸识别模型上进行了测试，取得了显著的效果。该数据集还在一些著名的图像分类模型上进行了训练和测试，结果也很有竞争力。该数据集可从以下链接获取：- https://drive.google.com/drive/folders/1yetcgUOL1TOP4rod1geGsOkIrIJHtcEw?usp=sharing

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Text-Guided Synthesis of Masked Face Images

The COVID-19 pandemic has made us all understand that wearing a face mask protects us from the spread of respiratory viruses. The face authentication systems, which are trained on the basis of facial key points such as the eyes, nose, and mouth, found it difficult to identify the person when the majority of the face is covered by the face mask. Removing the mask for authentication will cause the infection to spread. The possible solutions are: (a) to train the face recognition systems to identify the person with the upper face features (b) Reconstruct the complete face of the person with a generative model. (c) train the model with a dataset of the masked faces of the people. In this paper, we explore the scope of generative models for image synthesis. We used stable diffusion to generate masked face images of popular celebrities on various text prompts. A realistic dataset of 15K masked face images of 100 celebrities is generated and is called the Realistic Synthetic Masked Face Dataset (RSMFD). The model and the generated dataset will be made public so that researchers can augment the dataset. According to our knowledge, this is the largest masked face recognition dataset with realistic images. The generated images were tested on popular deep face recognition models and achieved significant results. The dataset is also trained and tested on some of the famous image classification models, and the results are competitive. The dataset is available on this link:- https://drive.google.com/drive/folders/1yetcgUOL1TOP4rod1geGsOkIrIJHtcEw?usp=sharing

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Multimedia Computing Communications and Applications 工程技术-计算机：理论方法

CiteScore

8.50

自引率

5.90%

发文量

285

审稿时长

7.5 months

期刊介绍： The ACM Transactions on Multimedia Computing, Communications, and Applications is the flagship publication of the ACM Special Interest Group in Multimedia (SIGMM). It is soliciting paper submissions on all aspects of multimedia. Papers on single media (for instance, audio, video, animation) and their processing are also welcome. TOMM is a peer-reviewed, archival journal, available in both print form and digital form. The Journal is published quarterly; with roughly 7 23-page articles in each issue. In addition, all Special Issues are published online-only to ensure a timely publication. The transactions consists primarily of research papers. This is an archival journal and it is intended that the papers will have lasting importance and value over time. In general, papers whose primary focus is on particular multimedia products or the current state of the industry will not be included.