{"title":"maff - gaze:基于多尺度自适应特征调制的车载凝视估计","authors":"Gan Zhang, Yafei Wang, Runze Yan, Xianping Fu","doi":"10.1016/j.displa.2025.103226","DOIUrl":null,"url":null,"abstract":"<div><div>Gaze estimation is a critical component of Driver Monitoring and Assistance Systems (DMAS), as it effectively identifies driver distraction and fatigue during driving, thus enhancing driving safety. Existing methods face challenges in achieving accurate gaze estimation in driving scenarios, due to complex factors such as illumination variation, facial occlusion, and extreme head poses. Therefore, an in-vehicle gaze estimation method with multi-scale adaptive feature modulation (MAFM-Gaze) is proposed in this paper. In MAFM-Gaze, the model takes only facial images as input and employs a pruned cross-stage partial (CSP) network to extract multi-scale features efficiently. A 3D Spatially Adaptive Feature Modulation (3D-SAFM) module is integrated into the feature mixing network, incorporating a multi-head concept with independent computation to fully exploit multi-scale features at each level, thereby enriching the current layer with critical global information and long-range dependencies. Additionally, a 3D Multi-scale Feature Fusion Module (3D-MFFM) is introduced to extract scale-invariant information and capture deeper interrelationships among multi-scale features. Experimental results shown that our model outperforms existing state-of-the-art in-vehicle gaze estimation methods, achieving a mean angular error of 6.65°with a compact model size of only 44.6MB. The code will be available at <span><span>https://github.com/zhanggan123456/MAFM-Gaze</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103226"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MAFM-Gaze: Multi-scale adaptive feature modulation for in-vehicle gaze estimation\",\"authors\":\"Gan Zhang, Yafei Wang, Runze Yan, Xianping Fu\",\"doi\":\"10.1016/j.displa.2025.103226\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Gaze estimation is a critical component of Driver Monitoring and Assistance Systems (DMAS), as it effectively identifies driver distraction and fatigue during driving, thus enhancing driving safety. Existing methods face challenges in achieving accurate gaze estimation in driving scenarios, due to complex factors such as illumination variation, facial occlusion, and extreme head poses. Therefore, an in-vehicle gaze estimation method with multi-scale adaptive feature modulation (MAFM-Gaze) is proposed in this paper. In MAFM-Gaze, the model takes only facial images as input and employs a pruned cross-stage partial (CSP) network to extract multi-scale features efficiently. A 3D Spatially Adaptive Feature Modulation (3D-SAFM) module is integrated into the feature mixing network, incorporating a multi-head concept with independent computation to fully exploit multi-scale features at each level, thereby enriching the current layer with critical global information and long-range dependencies. Additionally, a 3D Multi-scale Feature Fusion Module (3D-MFFM) is introduced to extract scale-invariant information and capture deeper interrelationships among multi-scale features. Experimental results shown that our model outperforms existing state-of-the-art in-vehicle gaze estimation methods, achieving a mean angular error of 6.65°with a compact model size of only 44.6MB. The code will be available at <span><span>https://github.com/zhanggan123456/MAFM-Gaze</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50570,\"journal\":{\"name\":\"Displays\",\"volume\":\"91 \",\"pages\":\"Article 103226\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-09-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Displays\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S014193822500263X\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S014193822500263X","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
MAFM-Gaze: Multi-scale adaptive feature modulation for in-vehicle gaze estimation
Gaze estimation is a critical component of Driver Monitoring and Assistance Systems (DMAS), as it effectively identifies driver distraction and fatigue during driving, thus enhancing driving safety. Existing methods face challenges in achieving accurate gaze estimation in driving scenarios, due to complex factors such as illumination variation, facial occlusion, and extreme head poses. Therefore, an in-vehicle gaze estimation method with multi-scale adaptive feature modulation (MAFM-Gaze) is proposed in this paper. In MAFM-Gaze, the model takes only facial images as input and employs a pruned cross-stage partial (CSP) network to extract multi-scale features efficiently. A 3D Spatially Adaptive Feature Modulation (3D-SAFM) module is integrated into the feature mixing network, incorporating a multi-head concept with independent computation to fully exploit multi-scale features at each level, thereby enriching the current layer with critical global information and long-range dependencies. Additionally, a 3D Multi-scale Feature Fusion Module (3D-MFFM) is introduced to extract scale-invariant information and capture deeper interrelationships among multi-scale features. Experimental results shown that our model outperforms existing state-of-the-art in-vehicle gaze estimation methods, achieving a mean angular error of 6.65°with a compact model size of only 44.6MB. The code will be available at https://github.com/zhanggan123456/MAFM-Gaze.
期刊介绍:
Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface.
Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.