maff - gaze：基于多尺度自适应特征调制的车载凝视估计

IF 3.4 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays Pub Date : 2025-09-20 DOI:10.1016/j.displa.2025.103226

Gan Zhang, Yafei Wang, Runze Yan, Xianping Fu

{"title":"maff - gaze：基于多尺度自适应特征调制的车载凝视估计","authors":"Gan Zhang, Yafei Wang, Runze Yan, Xianping Fu","doi":"10.1016/j.displa.2025.103226","DOIUrl":null,"url":null,"abstract":"<div><div>Gaze estimation is a critical component of Driver Monitoring and Assistance Systems (DMAS), as it effectively identifies driver distraction and fatigue during driving, thus enhancing driving safety. Existing methods face challenges in achieving accurate gaze estimation in driving scenarios, due to complex factors such as illumination variation, facial occlusion, and extreme head poses. Therefore, an in-vehicle gaze estimation method with multi-scale adaptive feature modulation (MAFM-Gaze) is proposed in this paper. In MAFM-Gaze, the model takes only facial images as input and employs a pruned cross-stage partial (CSP) network to extract multi-scale features efficiently. A 3D Spatially Adaptive Feature Modulation (3D-SAFM) module is integrated into the feature mixing network, incorporating a multi-head concept with independent computation to fully exploit multi-scale features at each level, thereby enriching the current layer with critical global information and long-range dependencies. Additionally, a 3D Multi-scale Feature Fusion Module (3D-MFFM) is introduced to extract scale-invariant information and capture deeper interrelationships among multi-scale features. Experimental results shown that our model outperforms existing state-of-the-art in-vehicle gaze estimation methods, achieving a mean angular error of 6.65°with a compact model size of only 44.6MB. The code will be available at <span><span>https://github.com/zhanggan123456/MAFM-Gaze</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103226"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MAFM-Gaze: Multi-scale adaptive feature modulation for in-vehicle gaze estimation\",\"authors\":\"Gan Zhang, Yafei Wang, Runze Yan, Xianping Fu\",\"doi\":\"10.1016/j.displa.2025.103226\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Gaze estimation is a critical component of Driver Monitoring and Assistance Systems (DMAS), as it effectively identifies driver distraction and fatigue during driving, thus enhancing driving safety. Existing methods face challenges in achieving accurate gaze estimation in driving scenarios, due to complex factors such as illumination variation, facial occlusion, and extreme head poses. Therefore, an in-vehicle gaze estimation method with multi-scale adaptive feature modulation (MAFM-Gaze) is proposed in this paper. In MAFM-Gaze, the model takes only facial images as input and employs a pruned cross-stage partial (CSP) network to extract multi-scale features efficiently. A 3D Spatially Adaptive Feature Modulation (3D-SAFM) module is integrated into the feature mixing network, incorporating a multi-head concept with independent computation to fully exploit multi-scale features at each level, thereby enriching the current layer with critical global information and long-range dependencies. Additionally, a 3D Multi-scale Feature Fusion Module (3D-MFFM) is introduced to extract scale-invariant information and capture deeper interrelationships among multi-scale features. Experimental results shown that our model outperforms existing state-of-the-art in-vehicle gaze estimation methods, achieving a mean angular error of 6.65°with a compact model size of only 44.6MB. The code will be available at <span><span>https://github.com/zhanggan123456/MAFM-Gaze</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50570,\"journal\":{\"name\":\"Displays\",\"volume\":\"91 \",\"pages\":\"Article 103226\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Displays\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S014193822500263X\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S014193822500263X","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

注视估计是驾驶员监控与辅助系统（DMAS）的重要组成部分，它可以有效识别驾驶员在驾驶过程中的注意力分散和疲劳，从而提高驾驶安全性。由于光照变化、面部遮挡和极端头部姿势等复杂因素，现有方法在实现驾驶场景中准确的凝视估计方面面临挑战。为此，本文提出了一种多尺度自适应特征调制（maff - gaze）的车载注视估计方法。在maff - gaze中，该模型仅将人脸图像作为输入，并采用一种裁剪的跨阶段部分（CSP）网络来高效地提取多尺度特征。将3D空间自适应特征调制（3D- safm）模块集成到特征混合网络中，结合独立计算的多头概念，充分利用每一级的多尺度特征，从而丰富当前层的关键全局信息和远程依赖关系。此外，引入三维多尺度特征融合模块（3D- mffm）提取尺度不变信息，捕捉多尺度特征之间更深层次的相互关系。实验结果表明，我们的模型优于现有的最先进的车载凝视估计方法，在紧凑的模型尺寸仅为44.6MB的情况下，平均角度误差为6.65°。代码可在https://github.com/zhanggan123456/MAFM-Gaze上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MAFM-Gaze: Multi-scale adaptive feature modulation for in-vehicle gaze estimation

Gaze estimation is a critical component of Driver Monitoring and Assistance Systems (DMAS), as it effectively identifies driver distraction and fatigue during driving, thus enhancing driving safety. Existing methods face challenges in achieving accurate gaze estimation in driving scenarios, due to complex factors such as illumination variation, facial occlusion, and extreme head poses. Therefore, an in-vehicle gaze estimation method with multi-scale adaptive feature modulation (MAFM-Gaze) is proposed in this paper. In MAFM-Gaze, the model takes only facial images as input and employs a pruned cross-stage partial (CSP) network to extract multi-scale features efficiently. A 3D Spatially Adaptive Feature Modulation (3D-SAFM) module is integrated into the feature mixing network, incorporating a multi-head concept with independent computation to fully exploit multi-scale features at each level, thereby enriching the current layer with critical global information and long-range dependencies. Additionally, a 3D Multi-scale Feature Fusion Module (3D-MFFM) is introduced to extract scale-invariant information and capture deeper interrelationships among multi-scale features. Experimental results shown that our model outperforms existing state-of-the-art in-vehicle gaze estimation methods, achieving a mean angular error of 6.65°with a compact model size of only 44.6MB. The code will be available at https://github.com/zhanggan123456/MAFM-Gaze.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Displays 工程技术-工程：电子与电气

CiteScore

4.60

自引率

25.60%

发文量

138

审稿时长

92 days

期刊介绍： Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface. Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.