面向面部表情识别的属性导向身份自主学习

IF 9.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Affective Computing Pub Date : 2024-11-28 DOI:10.1109/TAFFC.2024.3508536

Mohd Aquib;Nishchal K. Verma;M. Jaleel Akhtar

{"title":"面向面部表情识别的属性导向身份自主学习","authors":"Mohd Aquib;Nishchal K. Verma;M. Jaleel Akhtar","doi":"10.1109/TAFFC.2024.3508536","DOIUrl":null,"url":null,"abstract":"Within computer vision, Facial Expression Recognition (FER) is a challenging task involving bottlenecks such as obtaining well-separated and compact expression embeddings that are invariant of identity. This study introduces AGILE (Attribute-Guided Identity-Independent Learning), an innovative approach to enhance FER by distilling identity information and promoting discriminative features. Initially, an adaptive <inline-formula><tex-math>$\\beta$</tex-math></inline-formula> Variational Autoencoder (VAE) is proposed from a fixed <inline-formula><tex-math>$\\beta$</tex-math></inline-formula>-VAE architecture leveraging the theory of single-dimensional Kalman filter. This enhances disentangled feature learning without compromising the reconstruction quality. Now, to achieve the desired FER objective, we design a two-stage modular scheme built within the framework of adaptive <inline-formula><tex-math>$\\beta$</tex-math></inline-formula>-VAE. In the first stage, an expression-driven identity modeling is proposed where an identity encoder is trained with a novel training loss to embed the most likely state corresponding to every subject in latent representation. In the next stage, keeping the identity encoder fixed, an expression encoder is trained with explicit guidance for the latent variables using an adversarial excitation and inhibition mechanism. This form of supervision enhances the transparency and interpretability of the expression space and helps to capture discriminative expression embeddings required for the downstream classification task. Experimental evaluations demonstrate that AGILE outperforms existing methods in identity and expression separability in the latent space and achieves superior performance over state-of-the-art methods on both lab-controlled and in-the-wild datasets, with recognition accuracies of 99.00% on CK+, 90.00% on Oulu-CASIA, 89.01% on MMI, 67.20% on Aff-Wild2, and 68.97% on AFEW.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 3","pages":"1362-1378"},"PeriodicalIF":9.8000,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AGILE: Attribute-Guided Identity Independent Learning for Facial Expression Recognition\",\"authors\":\"Mohd Aquib;Nishchal K. Verma;M. Jaleel Akhtar\",\"doi\":\"10.1109/TAFFC.2024.3508536\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Within computer vision, Facial Expression Recognition (FER) is a challenging task involving bottlenecks such as obtaining well-separated and compact expression embeddings that are invariant of identity. This study introduces AGILE (Attribute-Guided Identity-Independent Learning), an innovative approach to enhance FER by distilling identity information and promoting discriminative features. Initially, an adaptive <inline-formula><tex-math>$\\\\beta$</tex-math></inline-formula> Variational Autoencoder (VAE) is proposed from a fixed <inline-formula><tex-math>$\\\\beta$</tex-math></inline-formula>-VAE architecture leveraging the theory of single-dimensional Kalman filter. This enhances disentangled feature learning without compromising the reconstruction quality. Now, to achieve the desired FER objective, we design a two-stage modular scheme built within the framework of adaptive <inline-formula><tex-math>$\\\\beta$</tex-math></inline-formula>-VAE. In the first stage, an expression-driven identity modeling is proposed where an identity encoder is trained with a novel training loss to embed the most likely state corresponding to every subject in latent representation. In the next stage, keeping the identity encoder fixed, an expression encoder is trained with explicit guidance for the latent variables using an adversarial excitation and inhibition mechanism. This form of supervision enhances the transparency and interpretability of the expression space and helps to capture discriminative expression embeddings required for the downstream classification task. Experimental evaluations demonstrate that AGILE outperforms existing methods in identity and expression separability in the latent space and achieves superior performance over state-of-the-art methods on both lab-controlled and in-the-wild datasets, with recognition accuracies of 99.00% on CK+, 90.00% on Oulu-CASIA, 89.01% on MMI, 67.20% on Aff-Wild2, and 68.97% on AFEW.\",\"PeriodicalId\":13131,\"journal\":{\"name\":\"IEEE Transactions on Affective Computing\",\"volume\":\"16 3\",\"pages\":\"1362-1378\"},\"PeriodicalIF\":9.8000,\"publicationDate\":\"2024-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Affective Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10770589/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10770589/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在计算机视觉中，面部表情识别（FER）是一项具有挑战性的任务，涉及诸如获得良好分离和紧凑的、身份不变的表情嵌入等瓶颈。本研究引入了一种创新的方法AGILE (Attribute-Guided identity - independent Learning)，通过提取身份信息和提升区别特征来提高FER。首先，利用一维卡尔曼滤波理论，从固定的$\beta$-VAE体系结构出发，提出了一种自适应$\beta$变分自编码器（VAE）。这在不影响重建质量的情况下增强了解纠缠特征学习。现在，为了实现期望的FER目标，我们设计了一个在自适应$\beta$-VAE框架内构建的两阶段模块化方案。在第一阶段，提出了一种表情驱动的身份建模方法，该方法使用一种新的训练损失来训练身份编码器，将每个被试对应的最可能状态嵌入到潜在表征中。在下一阶段，保持身份编码器固定，使用对抗性激励和抑制机制对潜在变量进行明确指导，训练表达式编码器。这种形式的监督增强了表达空间的透明度和可解释性，并有助于捕获下游分类任务所需的判别性表达嵌入。实验评估表明，AGILE在潜在空间的身份和表达分离性方面优于现有方法，并且在实验室控制和野外数据集上都优于最先进的方法，在CK+上的识别准确率为99.00%，在Oulu-CASIA上的识别准确率为90.00%，在MMI上的识别准确率为89.01%，在af - wild2上的识别准确率为67.20%，在AFEW上的识别准确率为68.97%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

AGILE: Attribute-Guided Identity Independent Learning for Facial Expression Recognition

Within computer vision, Facial Expression Recognition (FER) is a challenging task involving bottlenecks such as obtaining well-separated and compact expression embeddings that are invariant of identity. This study introduces AGILE (Attribute-Guided Identity-Independent Learning), an innovative approach to enhance FER by distilling identity information and promoting discriminative features. Initially, an adaptive

$\beta$

Variational Autoencoder (VAE) is proposed from a fixed

$\beta$

-VAE architecture leveraging the theory of single-dimensional Kalman filter. This enhances disentangled feature learning without compromising the reconstruction quality. Now, to achieve the desired FER objective, we design a two-stage modular scheme built within the framework of adaptive

$\beta$

-VAE. In the first stage, an expression-driven identity modeling is proposed where an identity encoder is trained with a novel training loss to embed the most likely state corresponding to every subject in latent representation. In the next stage, keeping the identity encoder fixed, an expression encoder is trained with explicit guidance for the latent variables using an adversarial excitation and inhibition mechanism. This form of supervision enhances the transparency and interpretability of the expression space and helps to capture discriminative expression embeddings required for the downstream classification task. Experimental evaluations demonstrate that AGILE outperforms existing methods in identity and expression separability in the latent space and achieves superior performance over state-of-the-art methods on both lab-controlled and in-the-wild datasets, with recognition accuracies of 99.00% on CK+, 90.00% on Oulu-CASIA, 89.01% on MMI, 67.20% on Aff-Wild2, and 68.97% on AFEW.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Affective Computing COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS

CiteScore

15.00

自引率

6.20%

发文量

174

期刊介绍： The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.