Information gap based knowledge distillation for occluded facial expression recognition

IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yan Zhang , Zenghui Li , Duo Shen , Ke Wang , Jia Li , Chenxing Xia
{"title":"Information gap based knowledge distillation for occluded facial expression recognition","authors":"Yan Zhang ,&nbsp;Zenghui Li ,&nbsp;Duo Shen ,&nbsp;Ke Wang ,&nbsp;Jia Li ,&nbsp;Chenxing Xia","doi":"10.1016/j.imavis.2024.105365","DOIUrl":null,"url":null,"abstract":"<div><div>Facial Expression Recognition (FER) with occlusion presents a challenging task in computer vision because facial occlusions result in poor visual data features. Recently, the region attention technique has been introduced to address this problem by researchers, which make the model perceive occluded regions of the face and prioritize the most discriminative non-occluded regions. However, in real-world scenarios, facial images are influenced by various factors, including hair, masks and sunglasses, making it difficult to extract high-quality features from these occluded facial images. This inevitably limits the effectiveness of attention mechanisms. In this paper, we observe a correlation in facial emotion features from the same image, both with and without occlusion. This correlation contributes to addressing the issue of facial occlusions. To this end, we propose a Information Gap based Knowledge Distillation (IGKD) to explore the latent relationship. Specifically, our approach involves feeding non-occluded and masked images into separate teacher and student networks. Due to the incomplete emotion information in the masked images, there exists an information gap between the teacher and student networks. During training, we aim to minimize this gap to enable the student network to learn this relationship. To enhance the teacher’s guidance, we introduce a joint learning strategy where the teacher conducts knowledge distillation on the student during the training of the teacher. Additionally, we introduce two novel constraints, called knowledge learn and knowledge feedback loss, to supervise and optimize both the teacher and student networks. The reported experimental results show that IGKD outperforms other algorithms on four benchmark datasets. Specifically, our IGKD achieves 87.57% on Occlusion-RAF-DB, 87.33% on Occlusion-FERPlus, 64.86% on Occlusion-AffectNet, and 73.25% on FED-RO, clearly demonstrating its effectiveness and robustness. Source code is released at: .</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105365"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624004700","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Facial Expression Recognition (FER) with occlusion presents a challenging task in computer vision because facial occlusions result in poor visual data features. Recently, the region attention technique has been introduced to address this problem by researchers, which make the model perceive occluded regions of the face and prioritize the most discriminative non-occluded regions. However, in real-world scenarios, facial images are influenced by various factors, including hair, masks and sunglasses, making it difficult to extract high-quality features from these occluded facial images. This inevitably limits the effectiveness of attention mechanisms. In this paper, we observe a correlation in facial emotion features from the same image, both with and without occlusion. This correlation contributes to addressing the issue of facial occlusions. To this end, we propose a Information Gap based Knowledge Distillation (IGKD) to explore the latent relationship. Specifically, our approach involves feeding non-occluded and masked images into separate teacher and student networks. Due to the incomplete emotion information in the masked images, there exists an information gap between the teacher and student networks. During training, we aim to minimize this gap to enable the student network to learn this relationship. To enhance the teacher’s guidance, we introduce a joint learning strategy where the teacher conducts knowledge distillation on the student during the training of the teacher. Additionally, we introduce two novel constraints, called knowledge learn and knowledge feedback loss, to supervise and optimize both the teacher and student networks. The reported experimental results show that IGKD outperforms other algorithms on four benchmark datasets. Specifically, our IGKD achieves 87.57% on Occlusion-RAF-DB, 87.33% on Occlusion-FERPlus, 64.86% on Occlusion-AffectNet, and 73.25% on FED-RO, clearly demonstrating its effectiveness and robustness. Source code is released at: .
基于信息间隙的遮挡面部表情识别知识蒸馏
由于面部遮挡导致视觉数据特征不佳,因此具有遮挡的面部表情识别在计算机视觉中是一项具有挑战性的任务。近年来,研究人员引入了区域注意技术来解决这一问题,该技术使模型感知人脸的遮挡区域,并优先考虑最具歧视性的非遮挡区域。然而,在现实场景中,面部图像受到各种因素的影响,包括头发、面具和太阳镜,因此很难从这些遮挡的面部图像中提取高质量的特征。这不可避免地限制了注意力机制的有效性。在本文中,我们观察了来自同一图像的面部情绪特征在有遮挡和没有遮挡的情况下的相关性。这种相关性有助于解决面部闭塞的问题。为此,我们提出了一种基于信息差距的知识蒸馏(IGKD)来探索潜在的关系。具体来说,我们的方法包括将未遮挡和遮挡的图像馈送到单独的教师和学生网络中。由于被蒙面图像中的情感信息不完整,师生网络之间存在信息鸿沟。在培训期间,我们的目标是尽量减少这种差距,使学生网络能够学习这种关系。为了加强教师的指导作用,我们引入了教师在培训过程中对学生进行知识提炼的联合学习策略。此外,我们引入了两个新的约束,称为知识学习和知识反馈损失,以监督和优化教师和学生网络。实验结果表明,IGKD算法在4个基准数据集上优于其他算法。具体来说,我们的IGKD在Occlusion-RAF-DB上达到87.57%,在Occlusion-FERPlus上达到87.33%,在Occlusion-AffectNet上达到64.86%,在FED-RO上达到73.25%,清楚地表明了它的有效性和鲁棒性。源代码发布在:。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Image and Vision Computing
Image and Vision Computing 工程技术-工程:电子与电气
CiteScore
8.50
自引率
8.50%
发文量
143
审稿时长
7.8 months
期刊介绍: Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信