Haiying Xia , Zhuolin Gong , Yumei Tan , Shuxiang Song
{"title":"Joint pyramidal perceptual attention and hierarchical consistency constraint for gaze estimation","authors":"Haiying Xia , Zhuolin Gong , Yumei Tan , Shuxiang Song","doi":"10.1016/j.cviu.2024.104105","DOIUrl":null,"url":null,"abstract":"<div><p>Eye gaze provides valuable cues about human intent, making gaze estimation a hot topic. Extracting multi-scale information has recently proven effective for gaze estimation in complex scenarios. However, existing methods for estimating gaze based on multi-scale features tend to focus only on information from single-level feature maps. Furthermore, information across different scales may also lack relevance. To address these issues, we propose a novel joint pyramidal perceptual attention and hierarchical consistency constraint (PaCo) for gaze estimation. The proposed PaCo consists of two main components: pyramidal perceptual attention module (PPAM) and hierarchical consistency constraint (HCC). Specifically, PPAM first extracts multi-scale spatial features using a pyramid structure, and then aggregates information from coarse granularity to fine granularity. In this way, PPAM enables the model to simultaneously focus on both the eye region and facial region at multiple scales. Then, HCC makes constrains consistency on low-level and high-level features, aiming to enhance the gaze semantic consistency between different feature levels. With the combination of PPAM and HCC, PaCo can learn more discriminative features in complex situations. Extensive experimental results show that PaCo achieves significant performance improvements on challenging datasets such as Gaze360, MPIIFaceGaze, and RT-GENE,reducing errors to 10.27<span><math><mo>°</mo></math></span>, 3.23<span><math><mo>°</mo></math></span>, 6.46<span><math><mo>°</mo></math></span>, respectively.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224001863","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Eye gaze provides valuable cues about human intent, making gaze estimation a hot topic. Extracting multi-scale information has recently proven effective for gaze estimation in complex scenarios. However, existing methods for estimating gaze based on multi-scale features tend to focus only on information from single-level feature maps. Furthermore, information across different scales may also lack relevance. To address these issues, we propose a novel joint pyramidal perceptual attention and hierarchical consistency constraint (PaCo) for gaze estimation. The proposed PaCo consists of two main components: pyramidal perceptual attention module (PPAM) and hierarchical consistency constraint (HCC). Specifically, PPAM first extracts multi-scale spatial features using a pyramid structure, and then aggregates information from coarse granularity to fine granularity. In this way, PPAM enables the model to simultaneously focus on both the eye region and facial region at multiple scales. Then, HCC makes constrains consistency on low-level and high-level features, aiming to enhance the gaze semantic consistency between different feature levels. With the combination of PPAM and HCC, PaCo can learn more discriminative features in complex situations. Extensive experimental results show that PaCo achieves significant performance improvements on challenging datasets such as Gaze360, MPIIFaceGaze, and RT-GENE,reducing errors to 10.27, 3.23, 6.46, respectively.
期刊介绍:
The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views.
Research Areas Include:
• Theory
• Early vision
• Data structures and representations
• Shape
• Range
• Motion
• Matching and recognition
• Architecture and languages
• Vision systems