Guangyao Zhou , Yuanlun Xie , Yiqin Fu , Zhaokun Wang
{"title":"Multi-loss, feature fusion and improved top-two-voting ensemble for facial expression recognition in the wild","authors":"Guangyao Zhou , Yuanlun Xie , Yiqin Fu , Zhaokun Wang","doi":"10.1016/j.neunet.2024.106937","DOIUrl":null,"url":null,"abstract":"<div><div>Facial expression recognition (FER) in the wild is a challenging pattern recognition task affected by the images’ low quality and has attracted broad interest in computer vision. Existing FER methods failed to obtain sufficient accuracy to support the practical applications, especially in scenarios with low fault tolerance, which limits the adaptability of FER. Targeting exploring the possibility of further improving the accuracy of FER in the wild, this paper proposes a novel single model named R18+FAML and an ensemble model named R18+FAML-FGA-T2V, which applies intra-feature fusion within a single network, feature fusion among multiple networks, and the ensemble decision strategy. Based on the backbone of ResNet18 (R18), R18+FAML combines internal feature fusion and three attention blocks, as well as uses multiple loss functions (FAML) to improve the diversity of the feature extraction. To effectively integrate feature extractors from multiple networks, we propose feature fusion among networks based on the genetic algorithm (FGA). Comprehensively considering and utilizing more classification information, we propose an ensemble strategy, i.e., the improved top-two-voting (T2V) of multiple networks with the same structure. Combining the above strategies, R18+FAML-FGA-T2V can focus on the main expression-aware areas by integrating interest areas of multiple networks. From experiments on three challenging FER datasets in the wild including RAF-DB, AffectNet-8 and AffectNet-7, our single model R18+FAML and ensemble model R18+FAML-FGA-T2V achieve the accuracies of <span><math><mrow><mfenced><mrow><mn>90</mn><mo>.</mo><mn>32</mn><mo>,</mo><mn>62</mn><mo>.</mo><mn>17</mn><mo>,</mo><mn>65</mn><mo>.</mo><mn>83</mn></mrow></mfenced><mtext>%</mtext></mrow></math></span> and <span><math><mrow><mfenced><mrow><mn>91</mn><mo>.</mo><mn>59</mn><mo>,</mo><mn>63</mn><mo>.</mo><mn>27</mn><mo>,</mo><mn>66</mn><mo>.</mo><mn>63</mn></mrow></mfenced><mtext>%</mtext></mrow></math></span> respectively, both achieving the state-of-the-art results.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"183 ","pages":"Article 106937"},"PeriodicalIF":6.0000,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608024008669","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Facial expression recognition (FER) in the wild is a challenging pattern recognition task affected by the images’ low quality and has attracted broad interest in computer vision. Existing FER methods failed to obtain sufficient accuracy to support the practical applications, especially in scenarios with low fault tolerance, which limits the adaptability of FER. Targeting exploring the possibility of further improving the accuracy of FER in the wild, this paper proposes a novel single model named R18+FAML and an ensemble model named R18+FAML-FGA-T2V, which applies intra-feature fusion within a single network, feature fusion among multiple networks, and the ensemble decision strategy. Based on the backbone of ResNet18 (R18), R18+FAML combines internal feature fusion and three attention blocks, as well as uses multiple loss functions (FAML) to improve the diversity of the feature extraction. To effectively integrate feature extractors from multiple networks, we propose feature fusion among networks based on the genetic algorithm (FGA). Comprehensively considering and utilizing more classification information, we propose an ensemble strategy, i.e., the improved top-two-voting (T2V) of multiple networks with the same structure. Combining the above strategies, R18+FAML-FGA-T2V can focus on the main expression-aware areas by integrating interest areas of multiple networks. From experiments on three challenging FER datasets in the wild including RAF-DB, AffectNet-8 and AffectNet-7, our single model R18+FAML and ensemble model R18+FAML-FGA-T2V achieve the accuracies of and respectively, both achieving the state-of-the-art results.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.