Yun Liu , Zhipeng Wen , Leida Li , Peiguang Jing , Daoxin Fan
{"title":"MIGF-Net:用于图像美学评估的多模态交互引导融合网络","authors":"Yun Liu , Zhipeng Wen , Leida Li , Peiguang Jing , Daoxin Fan","doi":"10.1016/j.patcog.2025.112401","DOIUrl":null,"url":null,"abstract":"<div><div>With the development of social media, people like to post images and comments to share their ideas, which provides rich visual and textural semantic information for image aesthetics assessment (IAA). However, most previous works either extracted the unimodal aesthetic features from image due to the difficulty of obtaining comments, or combined multimodal information together but ignoring the interactive relationship between image and comment, which limits the overall performance. To solve the above problem, we propose a Multimodal Interaction-Guided Fusion Network (MIGF-Net) for image aesthetics assessment based on both image and comment semantic information, which can not only solve the challenge of comment generating, but also provide the multimodal feature interactive information. Specifically, considering the coupling mechanism of the image theme, we construct a visual semantic fusion module to extract the visual semantic feature based on the visual attributes and the theme features. Then, a textural semantic feature extractor is designed to mine the semantic information hidden in comments, which not only addresses the issue of missing comments but also effectively complements the visual semantic features. Furthermore, we establish a Dual-Stream Interaction-Guided Fusion module to fuse the semantic features of images and comments, fully exploring the interactive relationship between images and comments in the human brain’s perception mechanism. Experimental results on two public image aesthetics evaluation datasets demonstrate that our model outperforms the current state-of-the-art methods. Our code will be released at <span><span>https://github.com/wenzhipeng123/MIGF-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112401"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MIGF-Net: Multimodal interaction-guided fusion network for image aesthetics assessment\",\"authors\":\"Yun Liu , Zhipeng Wen , Leida Li , Peiguang Jing , Daoxin Fan\",\"doi\":\"10.1016/j.patcog.2025.112401\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the development of social media, people like to post images and comments to share their ideas, which provides rich visual and textural semantic information for image aesthetics assessment (IAA). However, most previous works either extracted the unimodal aesthetic features from image due to the difficulty of obtaining comments, or combined multimodal information together but ignoring the interactive relationship between image and comment, which limits the overall performance. To solve the above problem, we propose a Multimodal Interaction-Guided Fusion Network (MIGF-Net) for image aesthetics assessment based on both image and comment semantic information, which can not only solve the challenge of comment generating, but also provide the multimodal feature interactive information. Specifically, considering the coupling mechanism of the image theme, we construct a visual semantic fusion module to extract the visual semantic feature based on the visual attributes and the theme features. Then, a textural semantic feature extractor is designed to mine the semantic information hidden in comments, which not only addresses the issue of missing comments but also effectively complements the visual semantic features. Furthermore, we establish a Dual-Stream Interaction-Guided Fusion module to fuse the semantic features of images and comments, fully exploring the interactive relationship between images and comments in the human brain’s perception mechanism. Experimental results on two public image aesthetics evaluation datasets demonstrate that our model outperforms the current state-of-the-art methods. Our code will be released at <span><span>https://github.com/wenzhipeng123/MIGF-Net</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"172 \",\"pages\":\"Article 112401\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325010623\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325010623","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
MIGF-Net: Multimodal interaction-guided fusion network for image aesthetics assessment
With the development of social media, people like to post images and comments to share their ideas, which provides rich visual and textural semantic information for image aesthetics assessment (IAA). However, most previous works either extracted the unimodal aesthetic features from image due to the difficulty of obtaining comments, or combined multimodal information together but ignoring the interactive relationship between image and comment, which limits the overall performance. To solve the above problem, we propose a Multimodal Interaction-Guided Fusion Network (MIGF-Net) for image aesthetics assessment based on both image and comment semantic information, which can not only solve the challenge of comment generating, but also provide the multimodal feature interactive information. Specifically, considering the coupling mechanism of the image theme, we construct a visual semantic fusion module to extract the visual semantic feature based on the visual attributes and the theme features. Then, a textural semantic feature extractor is designed to mine the semantic information hidden in comments, which not only addresses the issue of missing comments but also effectively complements the visual semantic features. Furthermore, we establish a Dual-Stream Interaction-Guided Fusion module to fuse the semantic features of images and comments, fully exploring the interactive relationship between images and comments in the human brain’s perception mechanism. Experimental results on two public image aesthetics evaluation datasets demonstrate that our model outperforms the current state-of-the-art methods. Our code will be released at https://github.com/wenzhipeng123/MIGF-Net.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.