Zichen Liang, Haiying Xia, Yumei Tan, Shuxiang Song
{"title":"Hard semantic mask strategy for automatic facial action unit recognition with teacher–student model","authors":"Zichen Liang, Haiying Xia, Yumei Tan, Shuxiang Song","doi":"10.1007/s00530-024-01385-x","DOIUrl":null,"url":null,"abstract":"<p>Facial Action Coding System (FACS) is a widely used technique in affective computing, which defines a series of facial action units (AUs) corresponding to localized regions of the face. Fine-grained feature information of critical regions is crucial for accurate AU recognition. However, conventional random masking techniques used in Masked Image Modeling (MIM) often overlook the inherent symmetry of faces and the complex interrelationships among facial muscles, leading to a lack of critical local details and poor AU recognition performance. To address these limitations, we propose a novel teacher-student model-based MIM framework called Hard Semantic Masking Strategy Teacher–Student (HSMS-TS). Specifically, we first introduce a hard semantic mask strategy in the teacher model, aims to guide the student network to focus on learning fine-grained AU-related representations. Then, the student network utilizes the attention maps from the pretrained teacher model to generate a more challenging masking method from a predefined template, increasing the learning difficulty and helping the student acquire better AU-related representations. The experimental results on two publicly available datasets, i.e., BP4D and DISFA, show the effectiveness of our proposed method with exceptional performance. Code will be publicly available at http://github.com/lzichen/HSMS-TS.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"191 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01385-x","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Facial Action Coding System (FACS) is a widely used technique in affective computing, which defines a series of facial action units (AUs) corresponding to localized regions of the face. Fine-grained feature information of critical regions is crucial for accurate AU recognition. However, conventional random masking techniques used in Masked Image Modeling (MIM) often overlook the inherent symmetry of faces and the complex interrelationships among facial muscles, leading to a lack of critical local details and poor AU recognition performance. To address these limitations, we propose a novel teacher-student model-based MIM framework called Hard Semantic Masking Strategy Teacher–Student (HSMS-TS). Specifically, we first introduce a hard semantic mask strategy in the teacher model, aims to guide the student network to focus on learning fine-grained AU-related representations. Then, the student network utilizes the attention maps from the pretrained teacher model to generate a more challenging masking method from a predefined template, increasing the learning difficulty and helping the student acquire better AU-related representations. The experimental results on two publicly available datasets, i.e., BP4D and DISFA, show the effectiveness of our proposed method with exceptional performance. Code will be publicly available at http://github.com/lzichen/HSMS-TS.
期刊介绍:
This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications. It features theoretical, experimental, and survey articles.