Zhengdan Yin, Sanping Zhou, Le Wang, Tao Dai, Gang Hua, Nanning Zheng
{"title":"残差特征学习与分层校准,用于凝视估计","authors":"Zhengdan Yin, Sanping Zhou, Le Wang, Tao Dai, Gang Hua, Nanning Zheng","doi":"10.1007/s00138-024-01545-z","DOIUrl":null,"url":null,"abstract":"<p>Gaze estimation aims to predict accurate gaze direction from natural eye images, which is an extreme challenging task due to both random variations in head pose and person-specific biases. Existing works often independently learn features from binocular images and directly concatenate them for gaze estimation. In this paper, we propose a simple yet effective two-stage framework for gaze estimation, in which both residual feature learning (RFL) and hierarchical gaze calibration (HGC) networks are designed to consistently improve the performance of gaze estimation. Specifically, the RFL network extracts informative features by jointly exploring the symmetric and asymmetric factors between left and right eyes, which can produce accurate initial predictions as much as possible. Besides, the HGC network cascades a personal-specific transform module to further transform the distribution of gaze point from coarse to fine, which can effectively compensate the subjective bias in initial predictions. Extensive experiments on both EVE and MPIIGaze datasets show that our method outperforms the state-of-the-art approaches.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"2 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Residual feature learning with hierarchical calibration for gaze estimation\",\"authors\":\"Zhengdan Yin, Sanping Zhou, Le Wang, Tao Dai, Gang Hua, Nanning Zheng\",\"doi\":\"10.1007/s00138-024-01545-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Gaze estimation aims to predict accurate gaze direction from natural eye images, which is an extreme challenging task due to both random variations in head pose and person-specific biases. Existing works often independently learn features from binocular images and directly concatenate them for gaze estimation. In this paper, we propose a simple yet effective two-stage framework for gaze estimation, in which both residual feature learning (RFL) and hierarchical gaze calibration (HGC) networks are designed to consistently improve the performance of gaze estimation. Specifically, the RFL network extracts informative features by jointly exploring the symmetric and asymmetric factors between left and right eyes, which can produce accurate initial predictions as much as possible. Besides, the HGC network cascades a personal-specific transform module to further transform the distribution of gaze point from coarse to fine, which can effectively compensate the subjective bias in initial predictions. Extensive experiments on both EVE and MPIIGaze datasets show that our method outperforms the state-of-the-art approaches.</p>\",\"PeriodicalId\":51116,\"journal\":{\"name\":\"Machine Vision and Applications\",\"volume\":\"2 1\",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-05-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine Vision and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s00138-024-01545-z\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Vision and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00138-024-01545-z","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
注视估计旨在从自然眼球图像中预测准确的注视方向,由于头部姿势的随机变化和特定人的偏差,这是一项极具挑战性的任务。现有的工作通常是独立地从双目图像中学习特征,然后直接串联起来进行注视估计。在本文中,我们提出了一个简单而有效的两阶段凝视估计框架,其中残差特征学习(RFL)和分层凝视校准(HGC)网络的设计能持续提高凝视估计的性能。具体来说,残差特征学习网络通过联合探索左右眼的对称和不对称因素来提取信息特征,从而尽可能产生准确的初始预测。此外,HGC 网络级联了个人特定的变换模块,进一步对注视点的分布进行由粗到细的变换,从而有效弥补了初始预测的主观偏差。在 EVE 和 MPIIGaze 数据集上进行的大量实验表明,我们的方法优于最先进的方法。
Residual feature learning with hierarchical calibration for gaze estimation
Gaze estimation aims to predict accurate gaze direction from natural eye images, which is an extreme challenging task due to both random variations in head pose and person-specific biases. Existing works often independently learn features from binocular images and directly concatenate them for gaze estimation. In this paper, we propose a simple yet effective two-stage framework for gaze estimation, in which both residual feature learning (RFL) and hierarchical gaze calibration (HGC) networks are designed to consistently improve the performance of gaze estimation. Specifically, the RFL network extracts informative features by jointly exploring the symmetric and asymmetric factors between left and right eyes, which can produce accurate initial predictions as much as possible. Besides, the HGC network cascades a personal-specific transform module to further transform the distribution of gaze point from coarse to fine, which can effectively compensate the subjective bias in initial predictions. Extensive experiments on both EVE and MPIIGaze datasets show that our method outperforms the state-of-the-art approaches.
期刊介绍:
Machine Vision and Applications publishes high-quality technical contributions in machine vision research and development. Specifically, the editors encourage submittals in all applications and engineering aspects of image-related computing. In particular, original contributions dealing with scientific, commercial, industrial, military, and biomedical applications of machine vision, are all within the scope of the journal.
Particular emphasis is placed on engineering and technology aspects of image processing and computer vision.
The following aspects of machine vision applications are of interest: algorithms, architectures, VLSI implementations, AI techniques and expert systems for machine vision, front-end sensing, multidimensional and multisensor machine vision, real-time techniques, image databases, virtual reality and visualization. Papers must include a significant experimental validation component.