{"title":"CVRSF-Net: Image Emotion Recognition by Combining Visual Relationship Features and Scene Features","authors":"Yutong Luo;Xinyue Zhong;Jialan Xie;Guangyuan Liu","doi":"10.1109/TETCI.2025.3543300","DOIUrl":null,"url":null,"abstract":"Image emotion recognition, which aims to analyze the emotional responses of people to various stimuli in images, has attracted substantial attention in recent years with the proliferation of social media. As human emotion is a highly complex and abstract cognitive process, simply extracting local or global features from an image is not sufficient for recognizing the emotion of an image. The psychologist Moshe proposed that visual objects are usually embedded in a scene with other related objects during human visual comprehension of images. Therefore, we propose a two-branch emotion-recognition network known as the combined visual relationship feature and scene feature network (CVRSF-Net). In the scene feature-extraction branch, a pretrained CLIP model is adopted to extract the visual features of images, with a feature channel weighting module to extract the scene features. In the visual relationship feature-extraction branch, a visual relationship detection model is used to extract the visual relationships in the images, and a semantic fusion module fuses the scenes and visual relationship features. Furthermore, we spatially weight the visual relationship features using class activation maps. Finally, the implicit relationships between different visual relationship features are obtained using a graph attention network, and a two-branch network loss function is designed to train the model. The experimental results showed that the recognition rates of the proposed network were 79.80%, 69.81%, and 36.72% for the FI-8, Emotion-6, and WEBEmo datasets, respectively. The proposed algorithm achieves state-of-the-art results compared to existing methods.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 3","pages":"2321-2333"},"PeriodicalIF":5.3000,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10918804/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Image emotion recognition, which aims to analyze the emotional responses of people to various stimuli in images, has attracted substantial attention in recent years with the proliferation of social media. As human emotion is a highly complex and abstract cognitive process, simply extracting local or global features from an image is not sufficient for recognizing the emotion of an image. The psychologist Moshe proposed that visual objects are usually embedded in a scene with other related objects during human visual comprehension of images. Therefore, we propose a two-branch emotion-recognition network known as the combined visual relationship feature and scene feature network (CVRSF-Net). In the scene feature-extraction branch, a pretrained CLIP model is adopted to extract the visual features of images, with a feature channel weighting module to extract the scene features. In the visual relationship feature-extraction branch, a visual relationship detection model is used to extract the visual relationships in the images, and a semantic fusion module fuses the scenes and visual relationship features. Furthermore, we spatially weight the visual relationship features using class activation maps. Finally, the implicit relationships between different visual relationship features are obtained using a graph attention network, and a two-branch network loss function is designed to train the model. The experimental results showed that the recognition rates of the proposed network were 79.80%, 69.81%, and 36.72% for the FI-8, Emotion-6, and WEBEmo datasets, respectively. The proposed algorithm achieves state-of-the-art results compared to existing methods.
期刊介绍:
The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys.
TETCI is an electronics only publication. TETCI publishes six issues per year.
Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.