{"title":"Fine Tuning Vision Transformer Model for Facial Emotion Recognition: Performance Analysis for Human-Machine Teaming","authors":"Sanjeev Roka, D. Rawat","doi":"10.1109/IRI58017.2023.00030","DOIUrl":null,"url":null,"abstract":"Facial Emotion Recognition (FER) has become essential in various domains, including robotic systems, affective computing, emotion-triggered intelligent agents, and human-computer interaction for human-machine teaming. Although Convolutional Neural Network (CNN)-based models were popular for facial emotion classification, Transformer-based models have shown better performance in computer vision tasks such as image classification, semantic segmentation, and object detection. In this study, we explore the performance of the Vision Transformer model on a publicly available large FER dataset called AffectNet, which provides a realistic representation of emotions “in the wild.” We fine-tuned the model for the emotion classification task based on facial expressions. We achieved an accuracy of 64.48% on the Affectnet validation set, outperforming many other methods that use only transformer models. Further, we explore how they can be used for Human-Machine Teaming particularly in vehicular systems to improve driver safety, comfort, and experience.","PeriodicalId":290818,"journal":{"name":"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI58017.2023.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Facial Emotion Recognition (FER) has become essential in various domains, including robotic systems, affective computing, emotion-triggered intelligent agents, and human-computer interaction for human-machine teaming. Although Convolutional Neural Network (CNN)-based models were popular for facial emotion classification, Transformer-based models have shown better performance in computer vision tasks such as image classification, semantic segmentation, and object detection. In this study, we explore the performance of the Vision Transformer model on a publicly available large FER dataset called AffectNet, which provides a realistic representation of emotions “in the wild.” We fine-tuned the model for the emotion classification task based on facial expressions. We achieved an accuracy of 64.48% on the Affectnet validation set, outperforming many other methods that use only transformer models. Further, we explore how they can be used for Human-Machine Teaming particularly in vehicular systems to improve driver safety, comfort, and experience.