DMAGaze : Gaze estimation using feature disentanglement and multi-scale attention

IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Pattern Recognition Letters Pub Date : 2026-03-01 Epub Date: 2026-01-13 DOI:10.1016/j.patrec.2026.01.013
Haohan Chen , Hongjia Liu , Shiyong Lan , Wenwu Wang , Yixin Qiao , Yao Li , Guonan Deng
{"title":"DMAGaze : Gaze estimation using feature disentanglement and multi-scale attention","authors":"Haohan Chen ,&nbsp;Hongjia Liu ,&nbsp;Shiyong Lan ,&nbsp;Wenwu Wang ,&nbsp;Yixin Qiao ,&nbsp;Yao Li ,&nbsp;Guonan Deng","doi":"10.1016/j.patrec.2026.01.013","DOIUrl":null,"url":null,"abstract":"<div><div>Gaze estimation, which predicts gaze direction, commonly faces the challenge of interference from complex gaze-irrelevant information in face images—a key bottleneck limiting its accuracy in real-world scenarios. In this work, we propose DMAGaze, a novel gaze estimation framework that exploits information from facial images in three aspects: gaze-relevant global features (disentangled from facial image), local eye features (extracted from cropped eye patch), and head pose related features, to improve overall performance. Firstly, we design a new continuous mask-based Disentangler to separate gaze-relevant and gaze-irrelevant information in facial images through reconstructing the eye and non-eye regions using a dual-branch architecture. Furthermore, we introduce a new attention module, called Multi-Scale Global Local Attention Module (MS-GLAM), to fuse the global and local information at multiple scales via a customized attention structure, thereby further enhancing the information from the Disentangler. Finally, we combine the global gaze-relevant features, with head pose and local eye features, and pass them through the detection head for high-precision gaze estimation. Our proposed DMAGaze has been evaluated extensively on two widely used public datasets: obtaining a gaze estimation error of 3.74° on MPIIFaceGaze and 6.17° on RT-GENE, outperforming SOTA methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"201 ","pages":"Pages 109-116"},"PeriodicalIF":3.3000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865526000218","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/1/13 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Gaze estimation, which predicts gaze direction, commonly faces the challenge of interference from complex gaze-irrelevant information in face images—a key bottleneck limiting its accuracy in real-world scenarios. In this work, we propose DMAGaze, a novel gaze estimation framework that exploits information from facial images in three aspects: gaze-relevant global features (disentangled from facial image), local eye features (extracted from cropped eye patch), and head pose related features, to improve overall performance. Firstly, we design a new continuous mask-based Disentangler to separate gaze-relevant and gaze-irrelevant information in facial images through reconstructing the eye and non-eye regions using a dual-branch architecture. Furthermore, we introduce a new attention module, called Multi-Scale Global Local Attention Module (MS-GLAM), to fuse the global and local information at multiple scales via a customized attention structure, thereby further enhancing the information from the Disentangler. Finally, we combine the global gaze-relevant features, with head pose and local eye features, and pass them through the detection head for high-precision gaze estimation. Our proposed DMAGaze has been evaluated extensively on two widely used public datasets: obtaining a gaze estimation error of 3.74° on MPIIFaceGaze and 6.17° on RT-GENE, outperforming SOTA methods.
基于特征解纠缠和多尺度注意力的凝视估计
用于预测注视方向的注视估计通常面临着人脸图像中复杂注视无关信息干扰的挑战,这是限制其在现实场景中准确性的关键瓶颈。在这项工作中,我们提出了一种新的凝视估计框架DMAGaze,该框架从三个方面利用面部图像中的信息:凝视相关的全局特征(从面部图像中提取)、局部眼睛特征(从裁剪的眼罩中提取)和头部姿势相关特征,以提高整体性能。首先,我们设计了一种新的基于连续面具的解纠缠器,通过双分支结构重构眼睛和非眼睛区域,分离出人脸图像中与凝视相关和不相关的信息。此外,我们引入了一种新的注意力模块,称为多尺度全局局部注意力模块(MS-GLAM),通过定制的注意力结构融合多尺度的全局和局部信息,从而进一步增强来自解纠缠器的信息。最后,我们将全局凝视相关特征与头部姿态和局部眼睛特征结合起来,通过检测头进行高精度凝视估计。我们提出的DMAGaze已经在两个广泛使用的公共数据集上进行了广泛的评估:在MPIIFaceGaze和RT-GENE上获得了3.74°和6.17°的凝视估计误差,优于SOTA方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Pattern Recognition Letters
Pattern Recognition Letters 工程技术-计算机:人工智能
CiteScore
12.40
自引率
5.90%
发文量
287
审稿时长
9.1 months
期刊介绍: Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition. Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书