{"title":"Leveraging a Two-Level Attention Mechanism for Deep Face Recognition with Siamese One-Shot Learning","authors":"Arkan Mahmood Albayati, Wael Chtourou, F. Zarai","doi":"10.18196/jrc.v5i1.20135","DOIUrl":null,"url":null,"abstract":"Discriminative feature embedding is used for largescale facial recognition. Many image-based facial recognition networks use CNNs like ResNets and VGG-nets. Humans prioritise different elements, but CNNs treat all facial pictures equally. NLP and computer vision use attention to learn the most important part of an input signal. The inter-channel and inter-spatial attention mechanism is used to assess face image component significance in this study. Channel scalars are calculated using Global Average Pooling in face recognition channel attention. A recent study found that GAP encodes low-frequency channel information first. We compressed channels using discrete cosine transform (DCT) instead of scalar representation to evaluate information at frequencies other than the lowest frequency for the channel attention mechanism. Later layers can acquire the feature map after spatial attention. Channel and spatial attention increase CNN facial recognition feature extraction. Channel-only, spatial-only, parallel, sequential, or channel-after-spatial attention blocks exist. Current face recognition attention approaches may be outperformed on public datasets (Labelled Faces in the Wild).","PeriodicalId":443428,"journal":{"name":"Journal of Robotics and Control (JRC)","volume":"15 7","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Robotics and Control (JRC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18196/jrc.v5i1.20135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Discriminative feature embedding is used for largescale facial recognition. Many image-based facial recognition networks use CNNs like ResNets and VGG-nets. Humans prioritise different elements, but CNNs treat all facial pictures equally. NLP and computer vision use attention to learn the most important part of an input signal. The inter-channel and inter-spatial attention mechanism is used to assess face image component significance in this study. Channel scalars are calculated using Global Average Pooling in face recognition channel attention. A recent study found that GAP encodes low-frequency channel information first. We compressed channels using discrete cosine transform (DCT) instead of scalar representation to evaluate information at frequencies other than the lowest frequency for the channel attention mechanism. Later layers can acquire the feature map after spatial attention. Channel and spatial attention increase CNN facial recognition feature extraction. Channel-only, spatial-only, parallel, sequential, or channel-after-spatial attention blocks exist. Current face recognition attention approaches may be outperformed on public datasets (Labelled Faces in the Wild).