{"title":"A cascaded deep-learning-based model for face mask detection","authors":"Akhil Kumar","doi":"10.1108/dta-02-2022-0076","DOIUrl":null,"url":null,"abstract":"PurposeThis work aims to present a deep learning model for face mask detection in surveillance environments such as automatic teller machines (ATMs), banks, etc. to identify persons wearing face masks. In surveillance environments, complete visibility of the face area is a guideline, and criminals and law offenders commit crimes by hiding their faces behind a face mask. The face mask detector model proposed in this work can be used as a tool and integrated with surveillance cameras in autonomous surveillance environments to identify and catch law offenders and criminals.Design/methodology/approachThe proposed face mask detector is developed by integrating the residual network (ResNet)34 feature extractor on top of three You Only Look Once (YOLO) detection layers along with the usage of the spatial pyramid pooling (SPP) layer to extract a rich and dense feature map. Furthermore, at the training time, data augmentation operations such as Mosaic and MixUp have been applied to the feature extraction network so that it can get trained with images of varying complexities. The proposed detector is trained and tested over a custom face mask detection dataset consisting of 52,635 images. For validation, comparisons have been provided with the performance of YOLO v1, v2, tiny YOLO v1, v2, v3 and v4 and other benchmark work present in the literature by evaluating performance metrics such as precision, recall, F1 score, mean average precision (mAP) for the overall dataset and average precision (AP) for each class of the dataset.FindingsThe proposed face mask detector achieved 4.75–9.75 per cent higher detection accuracy in terms of mAP, 5–31 per cent higher AP for detection of faces with masks and, specifically, 2–30 per cent higher AP for detection of face masks on the face region as compared to the tested baseline variants of YOLO. Furthermore, the usage of the ResNet34 feature extractor and SPP layer in the proposed detection model reduced the training time and the detection time. The proposed face mask detection model can perform detection over an image in 0.45 s, which is 0.2–0.15 s lesser than that for other tested YOLO variants, thus making the proposed detection model perform detections at a higher speed.Research limitations/implicationsThe proposed face mask detector model can be utilized as a tool to detect persons with face masks who are a potential threat to the automatic surveillance environments such as ATMs, banks, airport security checks, etc. The other research implication of the proposed work is that it can be trained and tested for other object detection problems such as cancer detection in images, fish species detection, vehicle detection, etc.Practical implicationsThe proposed face mask detector can be integrated with automatic surveillance systems and used as a tool to detect persons with face masks who are potential threats to ATMs, banks, etc. and in the present times of COVID-19 to detect if the people are following a COVID-appropriate behavior of wearing a face mask or not in the public areas.Originality/valueThe novelty of this work lies in the usage of the ResNet34 feature extractor with YOLO detection layers, which makes the proposed model a compact and powerful convolutional neural-network-based face mask detector model. Furthermore, the SPP layer has been applied to the ResNet34 feature extractor to make it able to extract a rich and dense feature map. The other novelty of the present work is the implementation of Mosaic and MixUp data augmentation in the training network that provided the feature extractor with 3× images of varying complexities and orientations and further aided in achieving higher detection accuracy. The proposed model is novel in terms of extracting rich features, performing augmentation at the training time and achieving high detection accuracy while maintaining the detection speed.","PeriodicalId":56156,"journal":{"name":"Data Technologies and Applications","volume":"1 1","pages":"84-107"},"PeriodicalIF":1.7000,"publicationDate":"2022-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Technologies and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1108/dta-02-2022-0076","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1
Abstract
PurposeThis work aims to present a deep learning model for face mask detection in surveillance environments such as automatic teller machines (ATMs), banks, etc. to identify persons wearing face masks. In surveillance environments, complete visibility of the face area is a guideline, and criminals and law offenders commit crimes by hiding their faces behind a face mask. The face mask detector model proposed in this work can be used as a tool and integrated with surveillance cameras in autonomous surveillance environments to identify and catch law offenders and criminals.Design/methodology/approachThe proposed face mask detector is developed by integrating the residual network (ResNet)34 feature extractor on top of three You Only Look Once (YOLO) detection layers along with the usage of the spatial pyramid pooling (SPP) layer to extract a rich and dense feature map. Furthermore, at the training time, data augmentation operations such as Mosaic and MixUp have been applied to the feature extraction network so that it can get trained with images of varying complexities. The proposed detector is trained and tested over a custom face mask detection dataset consisting of 52,635 images. For validation, comparisons have been provided with the performance of YOLO v1, v2, tiny YOLO v1, v2, v3 and v4 and other benchmark work present in the literature by evaluating performance metrics such as precision, recall, F1 score, mean average precision (mAP) for the overall dataset and average precision (AP) for each class of the dataset.FindingsThe proposed face mask detector achieved 4.75–9.75 per cent higher detection accuracy in terms of mAP, 5–31 per cent higher AP for detection of faces with masks and, specifically, 2–30 per cent higher AP for detection of face masks on the face region as compared to the tested baseline variants of YOLO. Furthermore, the usage of the ResNet34 feature extractor and SPP layer in the proposed detection model reduced the training time and the detection time. The proposed face mask detection model can perform detection over an image in 0.45 s, which is 0.2–0.15 s lesser than that for other tested YOLO variants, thus making the proposed detection model perform detections at a higher speed.Research limitations/implicationsThe proposed face mask detector model can be utilized as a tool to detect persons with face masks who are a potential threat to the automatic surveillance environments such as ATMs, banks, airport security checks, etc. The other research implication of the proposed work is that it can be trained and tested for other object detection problems such as cancer detection in images, fish species detection, vehicle detection, etc.Practical implicationsThe proposed face mask detector can be integrated with automatic surveillance systems and used as a tool to detect persons with face masks who are potential threats to ATMs, banks, etc. and in the present times of COVID-19 to detect if the people are following a COVID-appropriate behavior of wearing a face mask or not in the public areas.Originality/valueThe novelty of this work lies in the usage of the ResNet34 feature extractor with YOLO detection layers, which makes the proposed model a compact and powerful convolutional neural-network-based face mask detector model. Furthermore, the SPP layer has been applied to the ResNet34 feature extractor to make it able to extract a rich and dense feature map. The other novelty of the present work is the implementation of Mosaic and MixUp data augmentation in the training network that provided the feature extractor with 3× images of varying complexities and orientations and further aided in achieving higher detection accuracy. The proposed model is novel in terms of extracting rich features, performing augmentation at the training time and achieving high detection accuracy while maintaining the detection speed.