Face Pyramid Vision Transformer

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-21 DOI:10.48550/arXiv.2210.11974

Khawar Islam, M. Zaheer, Arif Mahmood

引用次数: 1

Abstract

A novel Face Pyramid Vision Transformer (FPVT) is proposed to learn a discriminative multi-scale facial representations for face recognition and verification. In FPVT, Face Spatial Reduction Attention (FSRA) and Dimensionality Reduction (FDR) layers are employed to make the feature maps compact, thus reducing the computations. An Improved Patch Embedding (IPE) algorithm is proposed to exploit the benefits of CNNs in ViTs (e.g., shared weights, local context, and receptive fields) to model lower-level edges to higher-level semantic primitives. Within FPVT framework, a Convolutional Feed-Forward Network (CFFN) is proposed that extracts locality information to learn low level facial information. The proposed FPVT is evaluated on seven benchmark datasets and compared with ten existing state-of-the-art methods, including CNNs, pure ViTs, and Convolutional ViTs. Despite fewer parameters, FPVT has demonstrated excellent performance over the compared methods. Project page is available at https://khawar-islam.github.io/fpvt/

查看原文本刊更多论文

面部金字塔视觉变压器

提出了一种新的人脸金字塔视觉转换器(FPVT)，学习一种判别性的多尺度人脸表示，用于人脸识别和验证。FPVT采用Face Spatial Reduction Attention (FSRA)和Dimensionality Reduction (FDR)两层来压缩特征映射，从而减少了计算量。提出了一种改进的补丁嵌入(IPE)算法，利用cnn在vit中的优势(例如，共享权重、局部上下文和接受域)将低级边缘建模为高级语义原语。在FPVT框架内，提出了一种卷积前馈网络(CFFN)，提取局部信息学习低级人脸信息。提出的FPVT在7个基准数据集上进行了评估，并与现有的10种最先进的方法进行了比较，包括cnn、纯vit和卷积vit。尽管参数较少，但FPVT在对比方法中表现出了优异的性能。项目页面可访问https://khawar-islam.github.io/fpvt/

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference

自引率

0.00%

发文量