Kai Wang, Xiaoxing Zeng, Jianfei Yang, Debin Meng, Kaipeng Zhang, Xiaojiang Peng, Y. Qiao
{"title":"基于面部、身体和图像线索的群体情绪识别的级联注意网络","authors":"Kai Wang, Xiaoxing Zeng, Jianfei Yang, Debin Meng, Kaipeng Zhang, Xiaojiang Peng, Y. Qiao","doi":"10.1145/3242969.3264991","DOIUrl":null,"url":null,"abstract":"This paper presents our approach for group-level emotion recognition sub-challenge in the EmotiW 2018. The task is to classify an image into one of the group emotions such as positive, negative, and neutral. Our approach mainly explores three cues, namely face, body and global image with recent deep networks. Our main contribution is two-fold. First, we introduce body based Convolutional Neural Networks (CNNs) into this task based on our previous winner method [18]. For body based CNNs, we crop all bodies in an image with the state-of-the-art human pose estimation method and train CNNs with the image-level label to capture. The body cue captures a full view of an individual. Second, we propose a cascade attention network for the face cue in images. This network exploits the importance of each face in an image to generates a global representation based on all faces. The cascade attention network is not only complementary with other models but also improves the naive average pooling method by about 2%. We finally achieve the second place in this sub-challenge with classification accuracies of 86.9% and 67.48% on the validation set and testing set, respectively.","PeriodicalId":308751,"journal":{"name":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":"{\"title\":\"Cascade Attention Networks For Group Emotion Recognition with Face, Body and Image Cues\",\"authors\":\"Kai Wang, Xiaoxing Zeng, Jianfei Yang, Debin Meng, Kaipeng Zhang, Xiaojiang Peng, Y. Qiao\",\"doi\":\"10.1145/3242969.3264991\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents our approach for group-level emotion recognition sub-challenge in the EmotiW 2018. The task is to classify an image into one of the group emotions such as positive, negative, and neutral. Our approach mainly explores three cues, namely face, body and global image with recent deep networks. Our main contribution is two-fold. First, we introduce body based Convolutional Neural Networks (CNNs) into this task based on our previous winner method [18]. For body based CNNs, we crop all bodies in an image with the state-of-the-art human pose estimation method and train CNNs with the image-level label to capture. The body cue captures a full view of an individual. Second, we propose a cascade attention network for the face cue in images. This network exploits the importance of each face in an image to generates a global representation based on all faces. The cascade attention network is not only complementary with other models but also improves the naive average pooling method by about 2%. We finally achieve the second place in this sub-challenge with classification accuracies of 86.9% and 67.48% on the validation set and testing set, respectively.\",\"PeriodicalId\":308751,\"journal\":{\"name\":\"Proceedings of the 20th ACM International Conference on Multimodal Interaction\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"36\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 20th ACM International Conference on Multimodal Interaction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3242969.3264991\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3242969.3264991","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Cascade Attention Networks For Group Emotion Recognition with Face, Body and Image Cues
This paper presents our approach for group-level emotion recognition sub-challenge in the EmotiW 2018. The task is to classify an image into one of the group emotions such as positive, negative, and neutral. Our approach mainly explores three cues, namely face, body and global image with recent deep networks. Our main contribution is two-fold. First, we introduce body based Convolutional Neural Networks (CNNs) into this task based on our previous winner method [18]. For body based CNNs, we crop all bodies in an image with the state-of-the-art human pose estimation method and train CNNs with the image-level label to capture. The body cue captures a full view of an individual. Second, we propose a cascade attention network for the face cue in images. This network exploits the importance of each face in an image to generates a global representation based on all faces. The cascade attention network is not only complementary with other models but also improves the naive average pooling method by about 2%. We finally achieve the second place in this sub-challenge with classification accuracies of 86.9% and 67.48% on the validation set and testing set, respectively.