Si Chen, Xueyan Zhu, Da-han Wang, Shunzhi Zhu, Yun Wu
{"title":"基于自蒸馏的多区域变压器人脸属性识别","authors":"Si Chen, Xueyan Zhu, Da-han Wang, Shunzhi Zhu, Yun Wu","doi":"10.1109/FG57933.2023.10042513","DOIUrl":null,"url":null,"abstract":"Recently, transformers have shown great promising performance in various computer vision tasks. However, the current transformer based methods ignore the information exchanges between transformer blocks, and they have not been applied in the facial attribute recognition task. In this paper, we propose a multi-zone transformer based on self-distillation for FAR, termed MZTS, to predict the facial attributes. A multi-zone transformer encoder is firstly presented to achieve the interactions of the different transformer encoder blocks, thus avoiding forgetting the effective information between the transformer encoder block groups during the iteration process. Furthermore, we introduce a new self-distillation mechanism based on class tokens, which distills the class tokens obtained from the last transformer encoder block group to the other shallow groups by interacting with the significant information between the different transformer blocks through attention. Extensive experiments on the challenging CelebA and LFWA datasets have demonstrated the excellent performance of the proposed method for FAR.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Multi-Zone Transformer Based on Self-Distillation for Facial Attribute Recognition\",\"authors\":\"Si Chen, Xueyan Zhu, Da-han Wang, Shunzhi Zhu, Yun Wu\",\"doi\":\"10.1109/FG57933.2023.10042513\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, transformers have shown great promising performance in various computer vision tasks. However, the current transformer based methods ignore the information exchanges between transformer blocks, and they have not been applied in the facial attribute recognition task. In this paper, we propose a multi-zone transformer based on self-distillation for FAR, termed MZTS, to predict the facial attributes. A multi-zone transformer encoder is firstly presented to achieve the interactions of the different transformer encoder blocks, thus avoiding forgetting the effective information between the transformer encoder block groups during the iteration process. Furthermore, we introduce a new self-distillation mechanism based on class tokens, which distills the class tokens obtained from the last transformer encoder block group to the other shallow groups by interacting with the significant information between the different transformer blocks through attention. Extensive experiments on the challenging CelebA and LFWA datasets have demonstrated the excellent performance of the proposed method for FAR.\",\"PeriodicalId\":318766,\"journal\":{\"name\":\"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FG57933.2023.10042513\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FG57933.2023.10042513","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-Zone Transformer Based on Self-Distillation for Facial Attribute Recognition
Recently, transformers have shown great promising performance in various computer vision tasks. However, the current transformer based methods ignore the information exchanges between transformer blocks, and they have not been applied in the facial attribute recognition task. In this paper, we propose a multi-zone transformer based on self-distillation for FAR, termed MZTS, to predict the facial attributes. A multi-zone transformer encoder is firstly presented to achieve the interactions of the different transformer encoder blocks, thus avoiding forgetting the effective information between the transformer encoder block groups during the iteration process. Furthermore, we introduce a new self-distillation mechanism based on class tokens, which distills the class tokens obtained from the last transformer encoder block group to the other shallow groups by interacting with the significant information between the different transformer blocks through attention. Extensive experiments on the challenging CelebA and LFWA datasets have demonstrated the excellent performance of the proposed method for FAR.