{"title":"面向面部表情识别的属性导向身份自主学习","authors":"Mohd Aquib;Nishchal K. Verma;M. Jaleel Akhtar","doi":"10.1109/TAFFC.2024.3508536","DOIUrl":null,"url":null,"abstract":"Within computer vision, Facial Expression Recognition (FER) is a challenging task involving bottlenecks such as obtaining well-separated and compact expression embeddings that are invariant of identity. This study introduces AGILE (<b>A</b>ttribute-<b>G</b>uided <b>I</b>dentity-Independent <b>Le</b>arning), an innovative approach to enhance FER by distilling identity information and promoting discriminative features. Initially, an adaptive <inline-formula><tex-math>$\\beta$</tex-math></inline-formula> Variational Autoencoder (VAE) is proposed from a fixed <inline-formula><tex-math>$\\beta$</tex-math></inline-formula>-VAE architecture leveraging the theory of single-dimensional Kalman filter. This enhances disentangled feature learning without compromising the reconstruction quality. Now, to achieve the desired FER objective, we design a two-stage modular scheme built within the framework of adaptive <inline-formula><tex-math>$\\beta$</tex-math></inline-formula>-VAE. In the first stage, an expression-driven identity modeling is proposed where an identity encoder is trained with a novel training loss to embed the most likely state corresponding to every subject in latent representation. In the next stage, keeping the identity encoder fixed, an expression encoder is trained with explicit guidance for the latent variables using an adversarial excitation and inhibition mechanism. This form of supervision enhances the transparency and interpretability of the expression space and helps to capture discriminative expression embeddings required for the downstream classification task. Experimental evaluations demonstrate that AGILE outperforms existing methods in identity and expression separability in the latent space and achieves superior performance over state-of-the-art methods on both lab-controlled and in-the-wild datasets, with recognition accuracies of 99.00% on CK+, 90.00% on Oulu-CASIA, 89.01% on MMI, 67.20% on Aff-Wild2, and 68.97% on AFEW.","PeriodicalId":13131,"journal":{"name":"IEEE Transactions on Affective Computing","volume":"16 3","pages":"1362-1378"},"PeriodicalIF":9.8000,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AGILE: Attribute-Guided Identity Independent Learning for Facial Expression Recognition\",\"authors\":\"Mohd Aquib;Nishchal K. Verma;M. Jaleel Akhtar\",\"doi\":\"10.1109/TAFFC.2024.3508536\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Within computer vision, Facial Expression Recognition (FER) is a challenging task involving bottlenecks such as obtaining well-separated and compact expression embeddings that are invariant of identity. This study introduces AGILE (<b>A</b>ttribute-<b>G</b>uided <b>I</b>dentity-Independent <b>Le</b>arning), an innovative approach to enhance FER by distilling identity information and promoting discriminative features. Initially, an adaptive <inline-formula><tex-math>$\\\\beta$</tex-math></inline-formula> Variational Autoencoder (VAE) is proposed from a fixed <inline-formula><tex-math>$\\\\beta$</tex-math></inline-formula>-VAE architecture leveraging the theory of single-dimensional Kalman filter. This enhances disentangled feature learning without compromising the reconstruction quality. Now, to achieve the desired FER objective, we design a two-stage modular scheme built within the framework of adaptive <inline-formula><tex-math>$\\\\beta$</tex-math></inline-formula>-VAE. In the first stage, an expression-driven identity modeling is proposed where an identity encoder is trained with a novel training loss to embed the most likely state corresponding to every subject in latent representation. In the next stage, keeping the identity encoder fixed, an expression encoder is trained with explicit guidance for the latent variables using an adversarial excitation and inhibition mechanism. This form of supervision enhances the transparency and interpretability of the expression space and helps to capture discriminative expression embeddings required for the downstream classification task. Experimental evaluations demonstrate that AGILE outperforms existing methods in identity and expression separability in the latent space and achieves superior performance over state-of-the-art methods on both lab-controlled and in-the-wild datasets, with recognition accuracies of 99.00% on CK+, 90.00% on Oulu-CASIA, 89.01% on MMI, 67.20% on Aff-Wild2, and 68.97% on AFEW.\",\"PeriodicalId\":13131,\"journal\":{\"name\":\"IEEE Transactions on Affective Computing\",\"volume\":\"16 3\",\"pages\":\"1362-1378\"},\"PeriodicalIF\":9.8000,\"publicationDate\":\"2024-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Affective Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10770589/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Affective Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10770589/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
AGILE: Attribute-Guided Identity Independent Learning for Facial Expression Recognition
Within computer vision, Facial Expression Recognition (FER) is a challenging task involving bottlenecks such as obtaining well-separated and compact expression embeddings that are invariant of identity. This study introduces AGILE (Attribute-Guided Identity-Independent Learning), an innovative approach to enhance FER by distilling identity information and promoting discriminative features. Initially, an adaptive $\beta$ Variational Autoencoder (VAE) is proposed from a fixed $\beta$-VAE architecture leveraging the theory of single-dimensional Kalman filter. This enhances disentangled feature learning without compromising the reconstruction quality. Now, to achieve the desired FER objective, we design a two-stage modular scheme built within the framework of adaptive $\beta$-VAE. In the first stage, an expression-driven identity modeling is proposed where an identity encoder is trained with a novel training loss to embed the most likely state corresponding to every subject in latent representation. In the next stage, keeping the identity encoder fixed, an expression encoder is trained with explicit guidance for the latent variables using an adversarial excitation and inhibition mechanism. This form of supervision enhances the transparency and interpretability of the expression space and helps to capture discriminative expression embeddings required for the downstream classification task. Experimental evaluations demonstrate that AGILE outperforms existing methods in identity and expression separability in the latent space and achieves superior performance over state-of-the-art methods on both lab-controlled and in-the-wild datasets, with recognition accuracies of 99.00% on CK+, 90.00% on Oulu-CASIA, 89.01% on MMI, 67.20% on Aff-Wild2, and 68.97% on AFEW.
期刊介绍:
The IEEE Transactions on Affective Computing is an international and interdisciplinary journal. Its primary goal is to share research findings on the development of systems capable of recognizing, interpreting, and simulating human emotions and related affective phenomena. The journal publishes original research on the underlying principles and theories that explain how and why affective factors shape human-technology interactions. It also focuses on how techniques for sensing and simulating affect can enhance our understanding of human emotions and processes. Additionally, the journal explores the design, implementation, and evaluation of systems that prioritize the consideration of affect in their usability. We also welcome surveys of existing work that provide new perspectives on the historical and future directions of this field.