利用数据增强生成和改进蒙面数据集

Q4 Biochemistry, Genetics and Molecular Biology

Journal of Biomolecular Techniques Pub Date : 2023-06-10 DOI:10.51173/jt.v5i2.1140

Waleed Ayad, Siraj Qays, Ali Al-Naji

{"title":"利用数据增强生成和改进蒙面数据集","authors":"Waleed Ayad, Siraj Qays, Ali Al-Naji","doi":"10.51173/jt.v5i2.1140","DOIUrl":null,"url":null,"abstract":"Before the spread of the COVID-19 virus in 2020, modern face recognition systems performed excellently, but then the wearing of masks was imposed by countries on their population, which led to a noteworthy decrease in the discriminatory ability of those systems, where they had been trained on large-scale datasets of unmasked faces and not available large-scale masked faces datasets that time. To contribute to addressing the shortage of large-scale data sets that consist of people wearing masks, a developed method has been presented to create simulated masks and overlay them on faces in two main steps. The first step was to detect, align and crop the faces of unmasked faces datasets in a dataset and then apply simulated masks on the faces utilizing the dlib-ml library. This method was used to generate a dataset for masked faces (CASIA-mask). The second step used five techniques of data augmentation with the generated dataset. To evaluate the masked dataset and data augmentation, an accuracy of 96.4% was achieved by training one of the latest and most important facial recognition systems, FaceNet, on the masked dataset. The same system also achieved excellent results of 97.71% when trained on CASIA-mask and data augmentation together.","PeriodicalId":39617,"journal":{"name":"Journal of Biomolecular Techniques","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Generating and Improving a Dataset of Masked Faces Using Data Augmentation\",\"authors\":\"Waleed Ayad, Siraj Qays, Ali Al-Naji\",\"doi\":\"10.51173/jt.v5i2.1140\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Before the spread of the COVID-19 virus in 2020, modern face recognition systems performed excellently, but then the wearing of masks was imposed by countries on their population, which led to a noteworthy decrease in the discriminatory ability of those systems, where they had been trained on large-scale datasets of unmasked faces and not available large-scale masked faces datasets that time. To contribute to addressing the shortage of large-scale data sets that consist of people wearing masks, a developed method has been presented to create simulated masks and overlay them on faces in two main steps. The first step was to detect, align and crop the faces of unmasked faces datasets in a dataset and then apply simulated masks on the faces utilizing the dlib-ml library. This method was used to generate a dataset for masked faces (CASIA-mask). The second step used five techniques of data augmentation with the generated dataset. To evaluate the masked dataset and data augmentation, an accuracy of 96.4% was achieved by training one of the latest and most important facial recognition systems, FaceNet, on the masked dataset. The same system also achieved excellent results of 97.71% when trained on CASIA-mask and data augmentation together.\",\"PeriodicalId\":39617,\"journal\":{\"name\":\"Journal of Biomolecular Techniques\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Biomolecular Techniques\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.51173/jt.v5i2.1140\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Biochemistry, Genetics and Molecular Biology\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomolecular Techniques","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.51173/jt.v5i2.1140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}

引用次数: 0

摘要

在2020年COVID-19病毒传播之前，现代人脸识别系统表现出色，但随后各国强制要求其人口戴口罩，导致这些系统的识别能力显著下降，因为当时这些系统是在大规模的未戴口罩的人脸数据集上进行训练的，而当时没有大规模的口罩数据集。为了解决由戴口罩的人组成的大规模数据集的短缺问题，提出了一种开发的方法，通过两个主要步骤创建模拟口罩并将其覆盖在脸上。第一步是检测、对齐和裁剪数据集中未被遮挡的人脸数据集的人脸，然后利用dlib-ml库在这些人脸上应用模拟的蒙版。利用该方法生成被遮挡人脸数据集(CASIA-mask)。第二步对生成的数据集使用了五种数据增强技术。为了评估蒙面数据集和数据增强，通过在蒙面数据集上训练最新和最重要的面部识别系统之一FaceNet，准确率达到96.4%。在CASIA-mask和数据增强相结合的训练下，该系统也取得了97.71%的优异成绩。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Generating and Improving a Dataset of Masked Faces Using Data Augmentation

Before the spread of the COVID-19 virus in 2020, modern face recognition systems performed excellently, but then the wearing of masks was imposed by countries on their population, which led to a noteworthy decrease in the discriminatory ability of those systems, where they had been trained on large-scale datasets of unmasked faces and not available large-scale masked faces datasets that time. To contribute to addressing the shortage of large-scale data sets that consist of people wearing masks, a developed method has been presented to create simulated masks and overlay them on faces in two main steps. The first step was to detect, align and crop the faces of unmasked faces datasets in a dataset and then apply simulated masks on the faces utilizing the dlib-ml library. This method was used to generate a dataset for masked faces (CASIA-mask). The second step used five techniques of data augmentation with the generated dataset. To evaluate the masked dataset and data augmentation, an accuracy of 96.4% was achieved by training one of the latest and most important facial recognition systems, FaceNet, on the masked dataset. The same system also achieved excellent results of 97.71% when trained on CASIA-mask and data augmentation together.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Biomolecular Techniques Biochemistry, Genetics and Molecular Biology-Molecular Biology

CiteScore

2.50

自引率

0.00%

发文量

期刊介绍： The Journal of Biomolecular Techniques is a peer-reviewed publication issued five times a year by the Association of Biomolecular Resource Facilities. The Journal was established to promote the central role biotechnology plays in contemporary research activities, to disseminate information among biomolecular resource facilities, and to communicate the biotechnology research conducted by the Association’s Research Groups and members, as well as other investigators.