2S-SGCN: A two-stage stratified graph convolutional network model for facial landmark detection on 3D data

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Vision and Image Understanding Pub Date : 2024-11-12 DOI:10.1016/j.cviu.2024.104227

Jacopo Burger, Giorgio Blandano, Giuseppe Maurizio Facchi, Raffaella Lanzarotti

{"title":"2S-SGCN: A two-stage stratified graph convolutional network model for facial landmark detection on 3D data","authors":"Jacopo Burger, Giorgio Blandano, Giuseppe Maurizio Facchi, Raffaella Lanzarotti","doi":"10.1016/j.cviu.2024.104227","DOIUrl":null,"url":null,"abstract":"<div><div>Facial Landmark Detection (FLD) algorithms play a crucial role in numerous computer vision applications, particularly in tasks such as face recognition, head pose estimation, and facial expression analysis. While FLD on images has long been the focus, the emergence of 3D data has led to a surge of interest in FLD on it due to its potential applications in various fields, including medical research. However, automating FLD in this context presents significant challenges, such as selecting suitable network architectures, refining outputs for precise landmark localization and optimizing computational efficiency. In response, this paper presents a novel approach, the 2-Stage Stratified Graph Convolutional Network (<span>2S-SGCN</span>), which addresses these challenges comprehensively. The first stage aims to detect landmark regions using heatmap regression, which leverages both local and long-range dependencies through a stratified approach. In the second stage, 3D landmarks are precisely determined using a new post-processing technique, namely <span>MSE-over-mesh</span>. <span>2S-SGCN</span> ensures both efficiency and suitability for resource-constrained devices. Experimental results on 3D scans from the public Facescape and Headspace datasets, as well as on point clouds derived from FLAME meshes collected in the DAD-3DHeads dataset, demonstrate that the proposed method achieves state-of-the-art performance across various conditions. Source code is accessible at <span><span>https://github.com/gfacchi-dev/CVIU-2S-SGCN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"250 ","pages":"Article 104227"},"PeriodicalIF":4.3000,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224003084","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Facial Landmark Detection (FLD) algorithms play a crucial role in numerous computer vision applications, particularly in tasks such as face recognition, head pose estimation, and facial expression analysis. While FLD on images has long been the focus, the emergence of 3D data has led to a surge of interest in FLD on it due to its potential applications in various fields, including medical research. However, automating FLD in this context presents significant challenges, such as selecting suitable network architectures, refining outputs for precise landmark localization and optimizing computational efficiency. In response, this paper presents a novel approach, the 2-Stage Stratified Graph Convolutional Network (2S-SGCN), which addresses these challenges comprehensively. The first stage aims to detect landmark regions using heatmap regression, which leverages both local and long-range dependencies through a stratified approach. In the second stage, 3D landmarks are precisely determined using a new post-processing technique, namely MSE-over-mesh. 2S-SGCN ensures both efficiency and suitability for resource-constrained devices. Experimental results on 3D scans from the public Facescape and Headspace datasets, as well as on point clouds derived from FLAME meshes collected in the DAD-3DHeads dataset, demonstrate that the proposed method achieves state-of-the-art performance across various conditions. Source code is accessible at https://github.com/gfacchi-dev/CVIU-2S-SGCN.

查看原文本刊更多论文

2S-SGCN：用于三维数据面部地标检测的两级分层图卷积网络模型

面部地标检测（FLD）算法在众多计算机视觉应用中发挥着至关重要的作用，尤其是在人脸识别、头部姿态估计和面部表情分析等任务中。长期以来，图像上的 FLD 一直是人们关注的焦点，而三维数据的出现使人们对其产生了浓厚的兴趣，因为它在医学研究等各个领域都有潜在的应用价值。然而，在这种情况下实现 FLD 自动化面临着巨大的挑战，例如选择合适的网络架构、完善精确地标定位的输出以及优化计算效率。为此，本文提出了一种新方法--两阶段分层图卷积网络（2S-SGCN），它能全面应对这些挑战。第一阶段旨在利用热图回归检测地标区域，通过分层方法利用局部和长程依赖关系。在第二阶段，利用一种新的后处理技术（即网格上的 MSE）精确确定三维地标。2S-SGCN 既保证了效率，又适用于资源有限的设备。对来自公共 Facescape 和 Headspace 数据集的三维扫描以及从 DAD-3DHeads 数据集收集的 FLAME 网格中提取的点云的实验结果表明，所提出的方法在各种条件下都能达到最先进的性能。源代码请访问 https://github.com/gfacchi-dev/CVIU-2S-SGCN。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems