基于面部、身体和图像线索的群体情绪识别的级联注意网络

Proceedings of the 20th ACM International Conference on Multimodal Interaction Pub Date : 2018-10-02 DOI:10.1145/3242969.3264991

Kai Wang, Xiaoxing Zeng, Jianfei Yang, Debin Meng, Kaipeng Zhang, Xiaojiang Peng, Y. Qiao

{"title":"基于面部、身体和图像线索的群体情绪识别的级联注意网络","authors":"Kai Wang, Xiaoxing Zeng, Jianfei Yang, Debin Meng, Kaipeng Zhang, Xiaojiang Peng, Y. Qiao","doi":"10.1145/3242969.3264991","DOIUrl":null,"url":null,"abstract":"This paper presents our approach for group-level emotion recognition sub-challenge in the EmotiW 2018. The task is to classify an image into one of the group emotions such as positive, negative, and neutral. Our approach mainly explores three cues, namely face, body and global image with recent deep networks. Our main contribution is two-fold. First, we introduce body based Convolutional Neural Networks (CNNs) into this task based on our previous winner method [18]. For body based CNNs, we crop all bodies in an image with the state-of-the-art human pose estimation method and train CNNs with the image-level label to capture. The body cue captures a full view of an individual. Second, we propose a cascade attention network for the face cue in images. This network exploits the importance of each face in an image to generates a global representation based on all faces. The cascade attention network is not only complementary with other models but also improves the naive average pooling method by about 2%. We finally achieve the second place in this sub-challenge with classification accuracies of 86.9% and 67.48% on the validation set and testing set, respectively.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":"{\"title\":\"Cascade Attention Networks For Group Emotion Recognition with Face, Body and Image Cues\",\"authors\":\"Kai Wang, Xiaoxing Zeng, Jianfei Yang, Debin Meng, Kaipeng Zhang, Xiaojiang Peng, Y. Qiao\",\"doi\":\"10.1145/3242969.3264991\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents our approach for group-level emotion recognition sub-challenge in the EmotiW 2018. The task is to classify an image into one of the group emotions such as positive, negative, and neutral. Our approach mainly explores three cues, namely face, body and global image with recent deep networks. Our main contribution is two-fold. First, we introduce body based Convolutional Neural Networks (CNNs) into this task based on our previous winner method [18]. For body based CNNs, we crop all bodies in an image with the state-of-the-art human pose estimation method and train CNNs with the image-level label to capture. The body cue captures a full view of an individual. Second, we propose a cascade attention network for the face cue in images. This network exploits the importance of each face in an image to generates a global representation based on all faces. The cascade attention network is not only complementary with other models but also improves the naive average pooling method by about 2%. We finally achieve the second place in this sub-challenge with classification accuracies of 86.9% and 67.48% on the validation set and testing set, respectively.\",\"PeriodicalId\":308751,\"journal\":{\"name\":\"Proceedings of the 20th ACM International Conference on Multimodal Interaction\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"36\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 20th ACM International Conference on Multimodal Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3242969.3264991\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3242969.3264991","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 36

摘要

本文介绍了我们在EmotiW 2018中对群体级情感识别子挑战的方法。任务是将图像分类为一组情绪，如积极，消极和中性。我们的方法主要是利用最近的深度网络来探索三个线索，即面部、身体和全局图像。我们的主要贡献有两方面。首先，我们将基于身体的卷积神经网络(cnn)引入到该任务中，该任务基于我们之前的赢家方法[18]。对于基于身体的cnn，我们使用最先进的人体姿态估计方法裁剪图像中的所有身体，并使用图像级标签训练cnn进行捕获。身体线索捕捉到一个人的全貌。其次，我们提出了一个针对图像中人脸线索的级联注意网络。该网络利用图像中每个人脸的重要性来生成基于所有人脸的全局表示。级联注意网络不仅与其他模型互补，而且比朴素平均池化方法提高了约2%。最终，我们在验证集和测试集上的分类准确率分别达到了86.9%和67.48%，获得了该子挑战的第二名。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cascade Attention Networks For Group Emotion Recognition with Face, Body and Image Cues

This paper presents our approach for group-level emotion recognition sub-challenge in the EmotiW 2018. The task is to classify an image into one of the group emotions such as positive, negative, and neutral. Our approach mainly explores three cues, namely face, body and global image with recent deep networks. Our main contribution is two-fold. First, we introduce body based Convolutional Neural Networks (CNNs) into this task based on our previous winner method [18]. For body based CNNs, we crop all bodies in an image with the state-of-the-art human pose estimation method and train CNNs with the image-level label to capture. The body cue captures a full view of an individual. Second, we propose a cascade attention network for the face cue in images. This network exploits the importance of each face in an image to generates a global representation based on all faces. The cascade attention network is not only complementary with other models but also improves the naive average pooling method by about 2%. We finally achieve the second place in this sub-challenge with classification accuracies of 86.9% and 67.48% on the validation set and testing set, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 20th ACM International Conference on Multimodal Interaction

自引率

0.00%

发文量