{"title":"面向面部情感识别的微调视觉变压器模型:人机协作的性能分析","authors":"Sanjeev Roka, D. Rawat","doi":"10.1109/IRI58017.2023.00030","DOIUrl":null,"url":null,"abstract":"Facial Emotion Recognition (FER) has become essential in various domains, including robotic systems, affective computing, emotion-triggered intelligent agents, and human-computer interaction for human-machine teaming. Although Convolutional Neural Network (CNN)-based models were popular for facial emotion classification, Transformer-based models have shown better performance in computer vision tasks such as image classification, semantic segmentation, and object detection. In this study, we explore the performance of the Vision Transformer model on a publicly available large FER dataset called AffectNet, which provides a realistic representation of emotions “in the wild.” We fine-tuned the model for the emotion classification task based on facial expressions. We achieved an accuracy of 64.48% on the Affectnet validation set, outperforming many other methods that use only transformer models. Further, we explore how they can be used for Human-Machine Teaming particularly in vehicular systems to improve driver safety, comfort, and experience.","PeriodicalId":290818,"journal":{"name":"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fine Tuning Vision Transformer Model for Facial Emotion Recognition: Performance Analysis for Human-Machine Teaming\",\"authors\":\"Sanjeev Roka, D. Rawat\",\"doi\":\"10.1109/IRI58017.2023.00030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Facial Emotion Recognition (FER) has become essential in various domains, including robotic systems, affective computing, emotion-triggered intelligent agents, and human-computer interaction for human-machine teaming. Although Convolutional Neural Network (CNN)-based models were popular for facial emotion classification, Transformer-based models have shown better performance in computer vision tasks such as image classification, semantic segmentation, and object detection. In this study, we explore the performance of the Vision Transformer model on a publicly available large FER dataset called AffectNet, which provides a realistic representation of emotions “in the wild.” We fine-tuned the model for the emotion classification task based on facial expressions. We achieved an accuracy of 64.48% on the Affectnet validation set, outperforming many other methods that use only transformer models. Further, we explore how they can be used for Human-Machine Teaming particularly in vehicular systems to improve driver safety, comfort, and experience.\",\"PeriodicalId\":290818,\"journal\":{\"name\":\"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IRI58017.2023.00030\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 24th International Conference on Information Reuse and Integration for Data Science (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI58017.2023.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Fine Tuning Vision Transformer Model for Facial Emotion Recognition: Performance Analysis for Human-Machine Teaming
Facial Emotion Recognition (FER) has become essential in various domains, including robotic systems, affective computing, emotion-triggered intelligent agents, and human-computer interaction for human-machine teaming. Although Convolutional Neural Network (CNN)-based models were popular for facial emotion classification, Transformer-based models have shown better performance in computer vision tasks such as image classification, semantic segmentation, and object detection. In this study, we explore the performance of the Vision Transformer model on a publicly available large FER dataset called AffectNet, which provides a realistic representation of emotions “in the wild.” We fine-tuned the model for the emotion classification task based on facial expressions. We achieved an accuracy of 64.48% on the Affectnet validation set, outperforming many other methods that use only transformer models. Further, we explore how they can be used for Human-Machine Teaming particularly in vehicular systems to improve driver safety, comfort, and experience.