{"title":"MDFA:多模态面部美学分析的定量框架","authors":"Huanyu Chen;Weisheng Li;Bin Xiao;Xinbo Gao","doi":"10.1109/TNNLS.2025.3570389","DOIUrl":null,"url":null,"abstract":"In the era of big data, the problem of facial beauty prediction (FBP) has been addressed using a combination of deep learning and esthetics based on data and models. Most existing methods are based on 2-D unimodal information processing. Owing to the high cost of 3-D data acquisition equipment, studies on the use of multimodal features of 2-D and 3-D for esthetic evaluation are scarce. Moreover, most existing methods are based on self-built 3-D datasets, which are limited to practical application scenarios of 2-D facial images. This study proposed a label distribution-based multimodal facial esthetic analysis framework (LDMFE). The LDMFE performed facial esthetic evaluation by combining 2-D and 3-D information following the process used by the human brain to conduct the 3-D esthetic evaluation. FBP was performed by extracting facial depth structure information using a depth information extraction network, DIENet, which comprises a facial structure perception layer (FSP-Layer) and an attention decision block (AD-Block). Furthermore, to ensure a high degree of agreement between the predicted label distribution of the network and the true distribution, a simple and efficient distribution measurement loss function called <inline-formula> <tex-math>${\\mathcal {L}}_{\\text {WD}}$ </tex-math></inline-formula> was proposed. Compared with the label distribution-based FBP loss and the latest FBP loss, <inline-formula> <tex-math>${\\mathcal {L}}_{\\text {WD}}$ </tex-math></inline-formula> was more stable and effective. The performance of LDMFE was evaluated using three datasets. The experimental results demonstrate that the LDMFE exhibits state-of-the-art performance.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 9","pages":"15779-15793"},"PeriodicalIF":8.9000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MDFA: A Quantitative Framework for the Analysis of Multimodal Facial Esthetics\",\"authors\":\"Huanyu Chen;Weisheng Li;Bin Xiao;Xinbo Gao\",\"doi\":\"10.1109/TNNLS.2025.3570389\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the era of big data, the problem of facial beauty prediction (FBP) has been addressed using a combination of deep learning and esthetics based on data and models. Most existing methods are based on 2-D unimodal information processing. Owing to the high cost of 3-D data acquisition equipment, studies on the use of multimodal features of 2-D and 3-D for esthetic evaluation are scarce. Moreover, most existing methods are based on self-built 3-D datasets, which are limited to practical application scenarios of 2-D facial images. This study proposed a label distribution-based multimodal facial esthetic analysis framework (LDMFE). The LDMFE performed facial esthetic evaluation by combining 2-D and 3-D information following the process used by the human brain to conduct the 3-D esthetic evaluation. FBP was performed by extracting facial depth structure information using a depth information extraction network, DIENet, which comprises a facial structure perception layer (FSP-Layer) and an attention decision block (AD-Block). Furthermore, to ensure a high degree of agreement between the predicted label distribution of the network and the true distribution, a simple and efficient distribution measurement loss function called <inline-formula> <tex-math>${\\\\mathcal {L}}_{\\\\text {WD}}$ </tex-math></inline-formula> was proposed. Compared with the label distribution-based FBP loss and the latest FBP loss, <inline-formula> <tex-math>${\\\\mathcal {L}}_{\\\\text {WD}}$ </tex-math></inline-formula> was more stable and effective. The performance of LDMFE was evaluated using three datasets. The experimental results demonstrate that the LDMFE exhibits state-of-the-art performance.\",\"PeriodicalId\":13303,\"journal\":{\"name\":\"IEEE transactions on neural networks and learning systems\",\"volume\":\"36 9\",\"pages\":\"15779-15793\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on neural networks and learning systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11038820/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11038820/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
MDFA: A Quantitative Framework for the Analysis of Multimodal Facial Esthetics
In the era of big data, the problem of facial beauty prediction (FBP) has been addressed using a combination of deep learning and esthetics based on data and models. Most existing methods are based on 2-D unimodal information processing. Owing to the high cost of 3-D data acquisition equipment, studies on the use of multimodal features of 2-D and 3-D for esthetic evaluation are scarce. Moreover, most existing methods are based on self-built 3-D datasets, which are limited to practical application scenarios of 2-D facial images. This study proposed a label distribution-based multimodal facial esthetic analysis framework (LDMFE). The LDMFE performed facial esthetic evaluation by combining 2-D and 3-D information following the process used by the human brain to conduct the 3-D esthetic evaluation. FBP was performed by extracting facial depth structure information using a depth information extraction network, DIENet, which comprises a facial structure perception layer (FSP-Layer) and an attention decision block (AD-Block). Furthermore, to ensure a high degree of agreement between the predicted label distribution of the network and the true distribution, a simple and efficient distribution measurement loss function called ${\mathcal {L}}_{\text {WD}}$ was proposed. Compared with the label distribution-based FBP loss and the latest FBP loss, ${\mathcal {L}}_{\text {WD}}$ was more stable and effective. The performance of LDMFE was evaluated using three datasets. The experimental results demonstrate that the LDMFE exhibits state-of-the-art performance.
期刊介绍:
The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.