{"title":"Multimodal sentiment analysis based on multi-head self-attention and convolutional block attention module","authors":"Feng Geng, Haihoua Yang, Changde Wu, Jinqiang Li","doi":"10.1109/aemcse55572.2022.00059","DOIUrl":null,"url":null,"abstract":"Sarcasm is a type of emotional expression. Sarcasm is commonly used on social media to express the inverse of what appears to be a statement and what is said. Previous automatic sarcasm detection mainly focused on text. With the rise of image sharing mode on social media platforms, text cannot fully reveal users’ emotions, so people begin to study multimodal sentiment analysis by combining text and images. Previous researches on sarcasm detection have used Bidirectional Long Short-term Memory Network (Bi-LSTM) and Residual Network (ResNet) to extract text and image feature vectors, respectively. While Multi-Head Self-Attention (MH-SA) is added to the Bi-LSTM model to perform relation extraction, which can effectively avoid complex feature engineering in traditional tasks. In the process of image extraction, the channel attention module (CAM) and the spatial attention module (SAM) are used to weight different spatial and channel features and focus on different regions and features of the image. The two complement each other, greatly improving the network’s ability to express features. On the Twitter dataset, our proposed model has a sarcasm detection accuracy of 87.55 %, which outperforms most models proposed in current papers.","PeriodicalId":309096,"journal":{"name":"2022 5th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)","volume":"161 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 5th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/aemcse55572.2022.00059","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Sarcasm is a type of emotional expression. Sarcasm is commonly used on social media to express the inverse of what appears to be a statement and what is said. Previous automatic sarcasm detection mainly focused on text. With the rise of image sharing mode on social media platforms, text cannot fully reveal users’ emotions, so people begin to study multimodal sentiment analysis by combining text and images. Previous researches on sarcasm detection have used Bidirectional Long Short-term Memory Network (Bi-LSTM) and Residual Network (ResNet) to extract text and image feature vectors, respectively. While Multi-Head Self-Attention (MH-SA) is added to the Bi-LSTM model to perform relation extraction, which can effectively avoid complex feature engineering in traditional tasks. In the process of image extraction, the channel attention module (CAM) and the spatial attention module (SAM) are used to weight different spatial and channel features and focus on different regions and features of the image. The two complement each other, greatly improving the network’s ability to express features. On the Twitter dataset, our proposed model has a sarcasm detection accuracy of 87.55 %, which outperforms most models proposed in current papers.