Eprian Junan Setianto, Esmeralda Contessa Djamal, Fikri Nugraha, Fatan Kasyidi
{"title":"Eye Tracking and Emotion Recognition Using Multiple Spatial-Temporal Networks","authors":"Eprian Junan Setianto, Esmeralda Contessa Djamal, Fikri Nugraha, Fatan Kasyidi","doi":"10.1109/ICoDSA55874.2022.9862881","DOIUrl":null,"url":null,"abstract":"E-commerce products need to be measured by reader responses as a more objective evaluation. Some of them are through emotion expression identification or eye-tracking. Using these two variables from video capture provides a more thorough evaluation of the response to interest and emotion. This study proposes a spatial-temporal multi-networks method using Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) from video for 60 seconds. The results showed that two classes of emotional expression and four directions of eye-tracking gave better accuracy, namely 95.83%, compared to three classes of emotion and four directions of eye-tracking, which was 91.67%. Experiments also show that using CNN-LSTM significantly increased accuracy, while the weight correction technique does not have much effect. The evaluated F1 score shows the consistency of the proposed model.","PeriodicalId":339135,"journal":{"name":"2022 International Conference on Data Science and Its Applications (ICoDSA)","volume":"363 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Data Science and Its Applications (ICoDSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICoDSA55874.2022.9862881","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
E-commerce products need to be measured by reader responses as a more objective evaluation. Some of them are through emotion expression identification or eye-tracking. Using these two variables from video capture provides a more thorough evaluation of the response to interest and emotion. This study proposes a spatial-temporal multi-networks method using Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) from video for 60 seconds. The results showed that two classes of emotional expression and four directions of eye-tracking gave better accuracy, namely 95.83%, compared to three classes of emotion and four directions of eye-tracking, which was 91.67%. Experiments also show that using CNN-LSTM significantly increased accuracy, while the weight correction technique does not have much effect. The evaluated F1 score shows the consistency of the proposed model.