Xiaofeng Wang , Tianbo Han , Songling Liu , Muhammad Shahroz Ajmal , Lu Chen , Yongqin Zhang , Yonghuai Liu
{"title":"MHAN: Multi-head hybrid attention network for facial expression recognition","authors":"Xiaofeng Wang , Tianbo Han , Songling Liu , Muhammad Shahroz Ajmal , Lu Chen , Yongqin Zhang , Yonghuai Liu","doi":"10.1016/j.patcog.2025.112015","DOIUrl":null,"url":null,"abstract":"<div><div>Integrating Facial Expression Recognition (FER) with deep learning techniques has significantly enhanced emotion analysis performance in the past decade. Convolutional neural networks (CNNs) and attention mechanisms facilitate the automatic extraction of complex features from facial expressions. However, current methods often face challenges in accurately capturing subtle variations in expressions, tend to be computationally intensive, and are susceptible to overfitting. To address these challenges, this paper proposes a lightweight FER model based on multi-head hybrid attention networks (MHAN). It designs two innovative modules: efficient local attention mixed feature network (ELA-MFN) and multi-head hybrid attention mechanism (MHAtt). The former integrates multi-scale convolutional kernels with the ELA attention mechanism to enhance feature representation while ensuring precise localization of critical areas, all within a lightweight framework. The latter utilizes multiple attention heads to generate attention maps and capture subtle distinctions in expressions. With only 4.27M parameters (94% reduction from POSTER’s 71.8M), MHAN effectively reduces computational resource requirements, and can be efficiently implemented for both fully supervised and semi-supervised learning tasks. And it employs a smooth label loss function solving overfitting issue. We have validated the effectiveness of MHAN over three public datasets RAF-DB, AffectNet, and FERPlus, including cross-dataset tests. The results show that MHAN outperforms state-of-the-art models in terms of accuracy and computational complexity, demonstrating improved robustness. MHAN can also recognize the expressions of non-traditional datasets like sculptures, validating its cross-domain generalization capabilities. The source code is available at <span><span>https://github.com/hanyao666/MHAN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112015"},"PeriodicalIF":7.5000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325006752","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Integrating Facial Expression Recognition (FER) with deep learning techniques has significantly enhanced emotion analysis performance in the past decade. Convolutional neural networks (CNNs) and attention mechanisms facilitate the automatic extraction of complex features from facial expressions. However, current methods often face challenges in accurately capturing subtle variations in expressions, tend to be computationally intensive, and are susceptible to overfitting. To address these challenges, this paper proposes a lightweight FER model based on multi-head hybrid attention networks (MHAN). It designs two innovative modules: efficient local attention mixed feature network (ELA-MFN) and multi-head hybrid attention mechanism (MHAtt). The former integrates multi-scale convolutional kernels with the ELA attention mechanism to enhance feature representation while ensuring precise localization of critical areas, all within a lightweight framework. The latter utilizes multiple attention heads to generate attention maps and capture subtle distinctions in expressions. With only 4.27M parameters (94% reduction from POSTER’s 71.8M), MHAN effectively reduces computational resource requirements, and can be efficiently implemented for both fully supervised and semi-supervised learning tasks. And it employs a smooth label loss function solving overfitting issue. We have validated the effectiveness of MHAN over three public datasets RAF-DB, AffectNet, and FERPlus, including cross-dataset tests. The results show that MHAN outperforms state-of-the-art models in terms of accuracy and computational complexity, demonstrating improved robustness. MHAN can also recognize the expressions of non-traditional datasets like sculptures, validating its cross-domain generalization capabilities. The source code is available at https://github.com/hanyao666/MHAN.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.