Hard semantic mask strategy for automatic facial action unit recognition with teacher–student model

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems Pub Date : 2024-06-20 DOI:10.1007/s00530-024-01385-x

Zichen Liang, Haiying Xia, Yumei Tan, Shuxiang Song

{"title":"Hard semantic mask strategy for automatic facial action unit recognition with teacher–student model","authors":"Zichen Liang, Haiying Xia, Yumei Tan, Shuxiang Song","doi":"10.1007/s00530-024-01385-x","DOIUrl":null,"url":null,"abstract":"<p>Facial Action Coding System (FACS) is a widely used technique in affective computing, which defines a series of facial action units (AUs) corresponding to localized regions of the face. Fine-grained feature information of critical regions is crucial for accurate AU recognition. However, conventional random masking techniques used in Masked Image Modeling (MIM) often overlook the inherent symmetry of faces and the complex interrelationships among facial muscles, leading to a lack of critical local details and poor AU recognition performance. To address these limitations, we propose a novel teacher-student model-based MIM framework called Hard Semantic Masking Strategy Teacher–Student (HSMS-TS). Specifically, we first introduce a hard semantic mask strategy in the teacher model, aims to guide the student network to focus on learning fine-grained AU-related representations. Then, the student network utilizes the attention maps from the pretrained teacher model to generate a more challenging masking method from a predefined template, increasing the learning difficulty and helping the student acquire better AU-related representations. The experimental results on two publicly available datasets, i.e., BP4D and DISFA, show the effectiveness of our proposed method with exceptional performance. Code will be publicly available at http://github.com/lzichen/HSMS-TS.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"191 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01385-x","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Facial Action Coding System (FACS) is a widely used technique in affective computing, which defines a series of facial action units (AUs) corresponding to localized regions of the face. Fine-grained feature information of critical regions is crucial for accurate AU recognition. However, conventional random masking techniques used in Masked Image Modeling (MIM) often overlook the inherent symmetry of faces and the complex interrelationships among facial muscles, leading to a lack of critical local details and poor AU recognition performance. To address these limitations, we propose a novel teacher-student model-based MIM framework called Hard Semantic Masking Strategy Teacher–Student (HSMS-TS). Specifically, we first introduce a hard semantic mask strategy in the teacher model, aims to guide the student network to focus on learning fine-grained AU-related representations. Then, the student network utilizes the attention maps from the pretrained teacher model to generate a more challenging masking method from a predefined template, increasing the learning difficulty and helping the student acquire better AU-related representations. The experimental results on two publicly available datasets, i.e., BP4D and DISFA, show the effectiveness of our proposed method with exceptional performance. Code will be publicly available at http://github.com/lzichen/HSMS-TS.

Abstract Image

查看原文本刊更多论文

利用师生模型自动识别面部动作单元的硬语义掩码策略

面部动作编码系统（FACS）是情感计算中广泛使用的一种技术，它定义了一系列与面部局部区域相对应的面部动作单元（AU）。关键区域的精细特征信息对于准确识别 AU 至关重要。然而，掩蔽图像建模（MIM）中使用的传统随机掩蔽技术往往忽略了人脸固有的对称性和面部肌肉之间复杂的相互关系，从而导致缺乏关键的局部细节，AU 识别性能低下。为了解决这些局限性，我们提出了一种新颖的基于师生模型的 MIM 框架，称为 "硬语义屏蔽策略师生（HSMS-TS）"。具体来说，我们首先在教师模型中引入硬语义屏蔽策略，旨在引导学生网络专注于学习细粒度的非盟相关表征。然后，学生网络利用来自预训练教师模型的注意图，从预定义模板中生成更具挑战性的掩码方法，从而增加学习难度，帮助学生获得更好的非盟相关表征。在两个公开数据集（即 BP4D 和 DISFA）上的实验结果表明，我们提出的方法非常有效，而且性能优异。代码将在 http://github.com/lzichen/HSMS-TS 上公开。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Multimedia Systems 工程技术-计算机：理论方法

CiteScore

5.40

自引率

7.70%

发文量

148

审稿时长

4.5 months

期刊介绍： This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications. It features theoretical, experimental, and survey articles.