{"title":"基于图像-文本模态的多粒度面部审美评价模型","authors":"Huanyu Chen , Yong Wang , Weisheng Li , Bin Xiao","doi":"10.1016/j.knosys.2025.114502","DOIUrl":null,"url":null,"abstract":"<div><div>Facial Beauty Prediction (FBP) is an emerging research direction at the intersection of artificial intelligence and aesthetics, which has attracted increasing attention in recent years. However, most existing methods rely solely on unimodal data and fail to comprehensively capture the multi-dimensional information of facial aesthetics. To address this challenge, we propose a multigranularity facial aesthetic evaluation model based on image-text modality (ITM-MGFA). By incorporating multi-granularity cognitive theory into the FBP task, the model effectively integrates both coarse-grained and fine-grained aesthetic features extracted from the CLIP encoder through a multigranularity representation module, a task-oriented dynamic alignment module, and a hierarchical interaction optimization module. This facilitates deep cross-modal interaction and fusion, significantly enhancing the model’s capability to model complex aesthetic attributes. Experimental results demonstrate that ITM-MGFA, leveraging the fusion of cross-modal information, achieves higher accuracy in facial aesthetic assessment task compared to traditional unimodal methods, offering a new direction for FBP research. Furthermore, the model can be applied in various scenarios, such as: simulation postoperative assessment of personalized cosmetic surgery in the medical aesthetics; selection of optimal facial aesthetic enhancement solutions on social media; and recommendation of matching solutions in cosmetic recommendation.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114502"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A multi-granularity facial aesthetic evaluation model based on image-text modality\",\"authors\":\"Huanyu Chen , Yong Wang , Weisheng Li , Bin Xiao\",\"doi\":\"10.1016/j.knosys.2025.114502\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Facial Beauty Prediction (FBP) is an emerging research direction at the intersection of artificial intelligence and aesthetics, which has attracted increasing attention in recent years. However, most existing methods rely solely on unimodal data and fail to comprehensively capture the multi-dimensional information of facial aesthetics. To address this challenge, we propose a multigranularity facial aesthetic evaluation model based on image-text modality (ITM-MGFA). By incorporating multi-granularity cognitive theory into the FBP task, the model effectively integrates both coarse-grained and fine-grained aesthetic features extracted from the CLIP encoder through a multigranularity representation module, a task-oriented dynamic alignment module, and a hierarchical interaction optimization module. This facilitates deep cross-modal interaction and fusion, significantly enhancing the model’s capability to model complex aesthetic attributes. Experimental results demonstrate that ITM-MGFA, leveraging the fusion of cross-modal information, achieves higher accuracy in facial aesthetic assessment task compared to traditional unimodal methods, offering a new direction for FBP research. Furthermore, the model can be applied in various scenarios, such as: simulation postoperative assessment of personalized cosmetic surgery in the medical aesthetics; selection of optimal facial aesthetic enhancement solutions on social media; and recommendation of matching solutions in cosmetic recommendation.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"330 \",\"pages\":\"Article 114502\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125015412\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125015412","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A multi-granularity facial aesthetic evaluation model based on image-text modality
Facial Beauty Prediction (FBP) is an emerging research direction at the intersection of artificial intelligence and aesthetics, which has attracted increasing attention in recent years. However, most existing methods rely solely on unimodal data and fail to comprehensively capture the multi-dimensional information of facial aesthetics. To address this challenge, we propose a multigranularity facial aesthetic evaluation model based on image-text modality (ITM-MGFA). By incorporating multi-granularity cognitive theory into the FBP task, the model effectively integrates both coarse-grained and fine-grained aesthetic features extracted from the CLIP encoder through a multigranularity representation module, a task-oriented dynamic alignment module, and a hierarchical interaction optimization module. This facilitates deep cross-modal interaction and fusion, significantly enhancing the model’s capability to model complex aesthetic attributes. Experimental results demonstrate that ITM-MGFA, leveraging the fusion of cross-modal information, achieves higher accuracy in facial aesthetic assessment task compared to traditional unimodal methods, offering a new direction for FBP research. Furthermore, the model can be applied in various scenarios, such as: simulation postoperative assessment of personalized cosmetic surgery in the medical aesthetics; selection of optimal facial aesthetic enhancement solutions on social media; and recommendation of matching solutions in cosmetic recommendation.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.