Yan Chen , Kexuan Li , Feng Tian , Ganglin Wei , Morteza Seberi
{"title":"结合注意力融合网络和混合知识蒸馏的轻量级表情识别","authors":"Yan Chen , Kexuan Li , Feng Tian , Ganglin Wei , Morteza Seberi","doi":"10.1016/j.neucom.2025.129656","DOIUrl":null,"url":null,"abstract":"<div><div>In online education, it is crucial to monitor the students’ learning status timely and accurately. Facial expression recognition serves as the main tool for assessing their engagement levels. Existing algorithms still have some issues when directly applied in online learning. These issues primarily involve the loss of facial features due to occlusion, which directly affects the accuracy of expression recognition and at the same time, the expression recognition models require a large number of parameters and significant computational power, making them difficult to deploy and apply effectively on mobile devices with limited hardware resources. We aims to address the two issues by a two-stage framework: training process of an occluded facial expression recognition model, and the compression process of the occlusion facial expression recognition. To be more specific, in the first stage, we propose an occlusion facial expression recognition model based on attention fusion (AFNet), which adopts a multi-branch spatial attention network, extracts local facial features, automatically perceives facial occluded regions, reduces the weight of the occluded areas, and enhance robustness to occlusion by combining with a randomly masked channel network. Meanwhile, a feature pyramid network is introduced to extract global multi-scale features. In the second stage, we propose a hybrid model compression algorithm based on multi-layer knowledge distillation (MKD). We introduce a spatial attention network to focus on the important knowledge, reducing the information loss during knowledge distillation. Experimental results on five datasets show that the AFNet and MKD outperformed the baseline.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"628 ","pages":"Article 129656"},"PeriodicalIF":6.5000,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Lightweight expression recognition combined attention fusion network with hybrid knowledge distillation for occluded e-learner facial images\",\"authors\":\"Yan Chen , Kexuan Li , Feng Tian , Ganglin Wei , Morteza Seberi\",\"doi\":\"10.1016/j.neucom.2025.129656\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In online education, it is crucial to monitor the students’ learning status timely and accurately. Facial expression recognition serves as the main tool for assessing their engagement levels. Existing algorithms still have some issues when directly applied in online learning. These issues primarily involve the loss of facial features due to occlusion, which directly affects the accuracy of expression recognition and at the same time, the expression recognition models require a large number of parameters and significant computational power, making them difficult to deploy and apply effectively on mobile devices with limited hardware resources. We aims to address the two issues by a two-stage framework: training process of an occluded facial expression recognition model, and the compression process of the occlusion facial expression recognition. To be more specific, in the first stage, we propose an occlusion facial expression recognition model based on attention fusion (AFNet), which adopts a multi-branch spatial attention network, extracts local facial features, automatically perceives facial occluded regions, reduces the weight of the occluded areas, and enhance robustness to occlusion by combining with a randomly masked channel network. Meanwhile, a feature pyramid network is introduced to extract global multi-scale features. In the second stage, we propose a hybrid model compression algorithm based on multi-layer knowledge distillation (MKD). We introduce a spatial attention network to focus on the important knowledge, reducing the information loss during knowledge distillation. Experimental results on five datasets show that the AFNet and MKD outperformed the baseline.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"628 \",\"pages\":\"Article 129656\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2025-02-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225003285\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225003285","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Lightweight expression recognition combined attention fusion network with hybrid knowledge distillation for occluded e-learner facial images
In online education, it is crucial to monitor the students’ learning status timely and accurately. Facial expression recognition serves as the main tool for assessing their engagement levels. Existing algorithms still have some issues when directly applied in online learning. These issues primarily involve the loss of facial features due to occlusion, which directly affects the accuracy of expression recognition and at the same time, the expression recognition models require a large number of parameters and significant computational power, making them difficult to deploy and apply effectively on mobile devices with limited hardware resources. We aims to address the two issues by a two-stage framework: training process of an occluded facial expression recognition model, and the compression process of the occlusion facial expression recognition. To be more specific, in the first stage, we propose an occlusion facial expression recognition model based on attention fusion (AFNet), which adopts a multi-branch spatial attention network, extracts local facial features, automatically perceives facial occluded regions, reduces the weight of the occluded areas, and enhance robustness to occlusion by combining with a randomly masked channel network. Meanwhile, a feature pyramid network is introduced to extract global multi-scale features. In the second stage, we propose a hybrid model compression algorithm based on multi-layer knowledge distillation (MKD). We introduce a spatial attention network to focus on the important knowledge, reducing the information loss during knowledge distillation. Experimental results on five datasets show that the AFNet and MKD outperformed the baseline.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.