Jun Zhang , Yunfei Zhang , Feixue Shao , Xuetao Ma , Shu Feng , Yongfei Wu , Daoxiang Zhou
{"title":"Efficient face anti-spoofing via head-aware transformer based knowledge distillation with 5 MB model parameters","authors":"Jun Zhang , Yunfei Zhang , Feixue Shao , Xuetao Ma , Shu Feng , Yongfei Wu , Daoxiang Zhou","doi":"10.1016/j.asoc.2024.112237","DOIUrl":null,"url":null,"abstract":"<div><p>Although face recognition technology has been applied in many scenarios, it still suffers from many types of presentation attacks, so face anti-spoofing (FAS) becomes a hot topic in computer vision. Recently, vision transformer is recognized as the mainstream architecture for FAS, which always relies on auxiliary information, sophisticated tricks and huge model parameters. Considering that face based identity authentication usually takes place on mobile-like devices, therefore how to design an effective and lightweight model is of great significance. Inspired by the powerful global modeling ability of self-attention and the model compression ability of knowledge distillation, a simple yet effective knowledge distillation approach is proposed for FAS under transformer framework. Our primary idea is to leverage the rich knowledge of a teacher network pre-trained on large-scale face data to guide the learning of a lightweight student network. The main contributions of our method are threefold: (1) Feature- and logits-level distillation are combined to transfer the rich knowledge of teacher to student. (2) A head-aware strategy is proposed to deal with the dimension mismatching issue of middle encoder layers between teacher and student networks, in which a novel attention head correlation matrix is introduced. (3) Our method can bridge the performance gap between teacher and student, and the resulting student network is extremely lightweight with only 5 MB parameters. Extensive experiments are conducted on three public face-spoofing datasets, CASIA-FASD, Replay-Attack and OULU-NPU, the results demonstrate that our method can obtain performance on par with or superior to most FAS methods and outperform many knowledge distillation methods. Meanwhile, the distilled student network achieves excellent performance with 17<span><math><mo>×</mo></math></span> fewer parameters and 9<span><math><mo>×</mo></math></span> faster inference time compared to the teacher network. The code will be publicly available at <span><span>https://github.com/Maricle-zhangjun/HaTFAS</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494624010111","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Although face recognition technology has been applied in many scenarios, it still suffers from many types of presentation attacks, so face anti-spoofing (FAS) becomes a hot topic in computer vision. Recently, vision transformer is recognized as the mainstream architecture for FAS, which always relies on auxiliary information, sophisticated tricks and huge model parameters. Considering that face based identity authentication usually takes place on mobile-like devices, therefore how to design an effective and lightweight model is of great significance. Inspired by the powerful global modeling ability of self-attention and the model compression ability of knowledge distillation, a simple yet effective knowledge distillation approach is proposed for FAS under transformer framework. Our primary idea is to leverage the rich knowledge of a teacher network pre-trained on large-scale face data to guide the learning of a lightweight student network. The main contributions of our method are threefold: (1) Feature- and logits-level distillation are combined to transfer the rich knowledge of teacher to student. (2) A head-aware strategy is proposed to deal with the dimension mismatching issue of middle encoder layers between teacher and student networks, in which a novel attention head correlation matrix is introduced. (3) Our method can bridge the performance gap between teacher and student, and the resulting student network is extremely lightweight with only 5 MB parameters. Extensive experiments are conducted on three public face-spoofing datasets, CASIA-FASD, Replay-Attack and OULU-NPU, the results demonstrate that our method can obtain performance on par with or superior to most FAS methods and outperform many knowledge distillation methods. Meanwhile, the distilled student network achieves excellent performance with 17 fewer parameters and 9 faster inference time compared to the teacher network. The code will be publicly available at https://github.com/Maricle-zhangjun/HaTFAS.
期刊介绍:
Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities.
Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.