在三维对齐空间中学习人脸结构依赖性以进行人脸对齐

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2024-08-23 DOI:10.1016/j.imavis.2024.105241

Biying Li , Zhiwei Liu , Jinqiao Wang

{"title":"在三维对齐空间中学习人脸结构依赖性以进行人脸对齐","authors":"Biying Li , Zhiwei Liu , Jinqiao Wang","doi":"10.1016/j.imavis.2024.105241","DOIUrl":null,"url":null,"abstract":"<div><p>Facial structure's statistical characteristics offer pivotal prior information in facial landmark prediction, forming inter-dependencies among different landmarks. Such inter-dependencies ensure that predictions adhere to the shape distribution typical of natural faces. In challenging scenarios like occlusions or extreme facial poses, this structure becomes indispensable, which can help to predict elusive landmarks based on more discernible ones. While current deep learning methods do capture these landmark dependencies, it's often an implicit process heavily reliant on vast training datasets. We contest that such implicit modeling approaches fail to manage more challenging situations. In this paper, we propose a new method that harnesses the facial structure and explicitly explores inter-dependencies among facial landmarks in an end-to-end fashion. We propose a Structural Dependency Learning Module (SDLM). It uses 3D face information to map facial features into a canonical UV space, in which the facial structure is explicitly 3D semantically aligned. Besides, to explore the global relationships between facial landmarks, we take advantage of the self-attention mechanism in the image and UV spaces. We name the proposed method Facial Structure-based Face Alignment (FSFA). FSFA reinforces the landmark structure, especially under challenging conditions. Extensive experiments demonstrate that FSFA achieves state-of-the-art performance on the WFLW, 300W, AFLW, and COFW68 datasets.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"150 ","pages":"Article 105241"},"PeriodicalIF":4.2000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning facial structural dependency in 3D aligned space for face alignment\",\"authors\":\"Biying Li , Zhiwei Liu , Jinqiao Wang\",\"doi\":\"10.1016/j.imavis.2024.105241\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Facial structure's statistical characteristics offer pivotal prior information in facial landmark prediction, forming inter-dependencies among different landmarks. Such inter-dependencies ensure that predictions adhere to the shape distribution typical of natural faces. In challenging scenarios like occlusions or extreme facial poses, this structure becomes indispensable, which can help to predict elusive landmarks based on more discernible ones. While current deep learning methods do capture these landmark dependencies, it's often an implicit process heavily reliant on vast training datasets. We contest that such implicit modeling approaches fail to manage more challenging situations. In this paper, we propose a new method that harnesses the facial structure and explicitly explores inter-dependencies among facial landmarks in an end-to-end fashion. We propose a Structural Dependency Learning Module (SDLM). It uses 3D face information to map facial features into a canonical UV space, in which the facial structure is explicitly 3D semantically aligned. Besides, to explore the global relationships between facial landmarks, we take advantage of the self-attention mechanism in the image and UV spaces. We name the proposed method Facial Structure-based Face Alignment (FSFA). FSFA reinforces the landmark structure, especially under challenging conditions. Extensive experiments demonstrate that FSFA achieves state-of-the-art performance on the WFLW, 300W, AFLW, and COFW68 datasets.</p></div>\",\"PeriodicalId\":50374,\"journal\":{\"name\":\"Image and Vision Computing\",\"volume\":\"150 \",\"pages\":\"Article 105241\"},\"PeriodicalIF\":4.2000,\"publicationDate\":\"2024-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Image and Vision Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0262885624003469\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624003469","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

面部结构的统计特征为面部地标预测提供了关键的先验信息，形成了不同地标之间的相互依存关系。这种相互依存关系可确保预测结果符合自然面部的典型形状分布。在遮挡或极端面部姿势等具有挑战性的场景中，这种结构变得不可或缺，它有助于根据更易辨别的地标预测难以捉摸的地标。虽然目前的深度学习方法确实能捕捉到这些地标依赖关系，但这往往是一个隐式过程，严重依赖于大量的训练数据集。我们认为，这种隐式建模方法无法应对更具挑战性的情况。在本文中，我们提出了一种新方法，利用面部结构，以端到端的方式明确探索面部地标之间的相互依赖关系。我们提出了结构依赖性学习模块（SDLM）。它利用三维人脸信息将面部特征映射到一个典型的 UV 空间，在这个空间中，面部结构被明确地三维语义对齐。此外，为了探索面部地标之间的全局关系，我们还利用了图像和 UV 空间中的自注意机制。我们将所提出的方法命名为基于面部结构的面部对齐（FSFA）。FSFA 强化了地标结构，尤其是在具有挑战性的条件下。大量实验证明，FSFA 在 WFLW、300W、AFLW 和 COFW68 数据集上取得了一流的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Learning facial structural dependency in 3D aligned space for face alignment

查看原文本刊更多论文

Learning facial structural dependency in 3D aligned space for face alignment

Facial structure's statistical characteristics offer pivotal prior information in facial landmark prediction, forming inter-dependencies among different landmarks. Such inter-dependencies ensure that predictions adhere to the shape distribution typical of natural faces. In challenging scenarios like occlusions or extreme facial poses, this structure becomes indispensable, which can help to predict elusive landmarks based on more discernible ones. While current deep learning methods do capture these landmark dependencies, it's often an implicit process heavily reliant on vast training datasets. We contest that such implicit modeling approaches fail to manage more challenging situations. In this paper, we propose a new method that harnesses the facial structure and explicitly explores inter-dependencies among facial landmarks in an end-to-end fashion. We propose a Structural Dependency Learning Module (SDLM). It uses 3D face information to map facial features into a canonical UV space, in which the facial structure is explicitly 3D semantically aligned. Besides, to explore the global relationships between facial landmarks, we take advantage of the self-attention mechanism in the image and UV spaces. We name the proposed method Facial Structure-based Face Alignment (FSFA). FSFA reinforces the landmark structure, especially under challenging conditions. Extensive experiments demonstrate that FSFA achieves state-of-the-art performance on the WFLW, 300W, AFLW, and COFW68 datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.