Xuesong Niu, Hu Han, Jiabei Zeng, Xuran Sun, S. Shan, Yan Huang, Songfan Yang, Xilin Chen
{"title":"自动订婚预测与GAP功能","authors":"Xuesong Niu, Hu Han, Jiabei Zeng, Xuran Sun, S. Shan, Yan Huang, Songfan Yang, Xilin Chen","doi":"10.1145/3242969.3264982","DOIUrl":null,"url":null,"abstract":"In this paper, we propose an automatic engagement prediction method for the Engagement in the Wild sub-challenge of EmotiW 2018. We first design a novel Gaze-AU-Pose (GAP) feature taking into account the information of gaze, action units and head pose of a subject. The GAP feature is then used for the subsequent engagement level prediction. To efficiently predict the engagement level for a long-time video, we divide the long-time video into multiple overlapped video clips and extract GAP feature for each clip. A deep model consisting of a Gated Recurrent Unit (GRU) layer and a fully connected layer is used as the engagement predictor. Finally, a mean pooling layer is applied to the per-clip estimation to get the final engagement level of the whole video. Experimental results on the validation set and test set show the effectiveness of the proposed approach. In particular, our approach achieves a promising result with an MSE of 0.0724 on the test set of Engagement Prediction Challenge of EmotiW 2018.t with an MSE of 0.072391 on the test set of Engagement Prediction Challenge of EmotiW 2018.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"49","resultStr":"{\"title\":\"Automatic Engagement Prediction with GAP Feature\",\"authors\":\"Xuesong Niu, Hu Han, Jiabei Zeng, Xuran Sun, S. Shan, Yan Huang, Songfan Yang, Xilin Chen\",\"doi\":\"10.1145/3242969.3264982\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose an automatic engagement prediction method for the Engagement in the Wild sub-challenge of EmotiW 2018. We first design a novel Gaze-AU-Pose (GAP) feature taking into account the information of gaze, action units and head pose of a subject. The GAP feature is then used for the subsequent engagement level prediction. To efficiently predict the engagement level for a long-time video, we divide the long-time video into multiple overlapped video clips and extract GAP feature for each clip. A deep model consisting of a Gated Recurrent Unit (GRU) layer and a fully connected layer is used as the engagement predictor. Finally, a mean pooling layer is applied to the per-clip estimation to get the final engagement level of the whole video. Experimental results on the validation set and test set show the effectiveness of the proposed approach. In particular, our approach achieves a promising result with an MSE of 0.0724 on the test set of Engagement Prediction Challenge of EmotiW 2018.t with an MSE of 0.072391 on the test set of Engagement Prediction Challenge of EmotiW 2018.\",\"PeriodicalId\":308751,\"journal\":{\"name\":\"Proceedings of the 20th ACM International Conference on Multimodal Interaction\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"49\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 20th ACM International Conference on Multimodal Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3242969.3264982\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3242969.3264982","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 49
摘要
在本文中,我们针对EmotiW 2018的“野外参与度”子挑战提出了一种自动参与度预测方法。我们首先设计了一种新的注视- au - pose (GAP)特征,该特征考虑了被测者的注视、动作单元和头部姿态信息。然后,GAP特征用于后续的业务参与水平预测。为了有效地预测长时间视频的参与程度,我们将长时间视频分成多个重叠的视频片段,并为每个片段提取GAP特征。采用由门控循环单元(GRU)层和全连接层组成的深度模型作为交战预测器。最后,将平均池化层应用于每个片段的估计,以获得整个视频的最终参与度。在验证集和测试集上的实验结果表明了该方法的有效性。特别是,我们的方法在EmotiW 2018的参与度预测挑战测试集上取得了0.0724的MSE,取得了很好的结果。在EmotiW 2018的参与度预测挑战测试集上,MSE为0.072391。
In this paper, we propose an automatic engagement prediction method for the Engagement in the Wild sub-challenge of EmotiW 2018. We first design a novel Gaze-AU-Pose (GAP) feature taking into account the information of gaze, action units and head pose of a subject. The GAP feature is then used for the subsequent engagement level prediction. To efficiently predict the engagement level for a long-time video, we divide the long-time video into multiple overlapped video clips and extract GAP feature for each clip. A deep model consisting of a Gated Recurrent Unit (GRU) layer and a fully connected layer is used as the engagement predictor. Finally, a mean pooling layer is applied to the per-clip estimation to get the final engagement level of the whole video. Experimental results on the validation set and test set show the effectiveness of the proposed approach. In particular, our approach achieves a promising result with an MSE of 0.0724 on the test set of Engagement Prediction Challenge of EmotiW 2018.t with an MSE of 0.072391 on the test set of Engagement Prediction Challenge of EmotiW 2018.