Pyramidal structure-correlated refinement for robust face alignment

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2025-07-24 DOI:10.1016/j.knosys.2025.114098

Qiyuan Dai, Qiang Ling

{"title":"Pyramidal structure-correlated refinement for robust face alignment","authors":"Qiyuan Dai, Qiang Ling","doi":"10.1016/j.knosys.2025.114098","DOIUrl":null,"url":null,"abstract":"<div><div>Recent face alignment methods attempt to capture representations of facial landmarks and learn the correlation between them. However, they often ignore the consistency between local landmarks and the overall face shape, which may lead to the low-efficiency correlation learning between long-distance landmarks. Besides, due to the uncertain localization, these methods may capture invalid local cues of landmark representations. To resolve these issues, we propose a pyramidal structure-correlated refinement method that integrates a novel fusion interactor into a pyramidal refinement framework. Specifically we introduce a fusion interactor to aggregate local regression cues of landmark representations into a global representation and encode the facial structure information. The facial structure information is then allocated to local representations to compensate for missing contexts of landmarks, such as occluded parts. Unlike vanilla attention mechanisms, our fusion interactor performs indirect interaction to avoid inconsistent landmark contexts, and incurs tiny computational complexity burdens. Additionally, to obtain valid local cues of landmarks, we further introduce a pyramidal refinement framework with multi-scale feature maps, which can sample landmark representations from the feature maps of specific scales according to the uncertainty of sampling positions. It can also gradually regularize the global representation with correct multi-scale spatial contexts to constrain the overall face shape. Experiments on some popular benchmarks demonstrate the effectiveness and robustness of our proposed method, especially its notably low failure rates in challenging scenarios.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"327 ","pages":"Article 114098"},"PeriodicalIF":7.6000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125011438","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Recent face alignment methods attempt to capture representations of facial landmarks and learn the correlation between them. However, they often ignore the consistency between local landmarks and the overall face shape, which may lead to the low-efficiency correlation learning between long-distance landmarks. Besides, due to the uncertain localization, these methods may capture invalid local cues of landmark representations. To resolve these issues, we propose a pyramidal structure-correlated refinement method that integrates a novel fusion interactor into a pyramidal refinement framework. Specifically we introduce a fusion interactor to aggregate local regression cues of landmark representations into a global representation and encode the facial structure information. The facial structure information is then allocated to local representations to compensate for missing contexts of landmarks, such as occluded parts. Unlike vanilla attention mechanisms, our fusion interactor performs indirect interaction to avoid inconsistent landmark contexts, and incurs tiny computational complexity burdens. Additionally, to obtain valid local cues of landmarks, we further introduce a pyramidal refinement framework with multi-scale feature maps, which can sample landmark representations from the feature maps of specific scales according to the uncertainty of sampling positions. It can also gradually regularize the global representation with correct multi-scale spatial contexts to constrain the overall face shape. Experiments on some popular benchmarks demonstrate the effectiveness and robustness of our proposed method, especially its notably low failure rates in challenging scenarios.

查看原文本刊更多论文

稳健面对齐的金字塔结构相关精化

最近的面部对齐方法试图捕捉面部地标的表示并学习它们之间的相关性。然而，他们往往忽略了局部地标与整体脸型之间的一致性，这可能导致远距离地标之间的相关学习效率低下。此外，由于定位的不确定性，这些方法可能会捕获无效的地标表征局部线索。为了解决这些问题，我们提出了一种金字塔结构相关的优化方法，该方法将一种新的融合相互作用体集成到金字塔优化框架中。具体来说，我们引入融合交互器，将地标表示的局部回归线索聚合到全局表示中，并对面部结构信息进行编码。然后将面部结构信息分配给局部表示，以补偿地标上下文的缺失，例如遮挡部分。与普通的注意力机制不同，我们的融合交互器执行间接交互以避免不一致的地标上下文，并产生微小的计算复杂性负担。此外，为了获得有效的地标局部线索，我们进一步引入了多尺度特征图的金字塔细化框架，该框架可以根据采样位置的不确定性从特定尺度的特征图中采样地标表示。它还可以用正确的多尺度空间上下文逐步正则化全局表示，以约束整体脸型。在一些流行的基准测试上的实验证明了我们提出的方法的有效性和鲁棒性，特别是它在具有挑战性的场景中显着的低故障率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.