{"title":"基于前庭眼反射和GDENet的三维目标消歧凝视深度估计","authors":"Ting Lei, Leshan Wang, Jixiang Chen","doi":"10.1016/j.displa.2025.102978","DOIUrl":null,"url":null,"abstract":"<div><div>Gaze depth estimation is crucial in eye-based selection with 3D target occlusion. Previous work has done some research on gaze depth estimation, but the accuracy is limited. In this work, we propose a new method based on VOR (Vestibulo-Ocular Reflex), which is based on the fact that VOR helps us stabilize gaze on a target when the head moves by performing compensatory eye movement in the opposite direction, and the closer the target is, the more compensation is needed. In this work, we collected abundant head and eye data when performing VOR at different gaze depths, and explored the relationship between gaze depth and VOR motion through a simple user study. Then we designed a new temporal neural network GDENet (Gaze Depth Estimation Network), which adopts a new label representation combining classification and regression, and uses multiple supervision to predict the gaze depth from input VOR information. In the range of 0.2 m to 5 m, we achieve centimeter-level prediction accuracy (MAE = 0.145 m, MSE = 0.031 m). Finally, a user experiment indicates that our depth estimation method can be used in 3D target disambiguation and is suitable for various 3D scenarios.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102978"},"PeriodicalIF":3.7000,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Gaze depth estimation using vestibulo-ocular reflex and GDENet for 3D target disambiguation\",\"authors\":\"Ting Lei, Leshan Wang, Jixiang Chen\",\"doi\":\"10.1016/j.displa.2025.102978\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Gaze depth estimation is crucial in eye-based selection with 3D target occlusion. Previous work has done some research on gaze depth estimation, but the accuracy is limited. In this work, we propose a new method based on VOR (Vestibulo-Ocular Reflex), which is based on the fact that VOR helps us stabilize gaze on a target when the head moves by performing compensatory eye movement in the opposite direction, and the closer the target is, the more compensation is needed. In this work, we collected abundant head and eye data when performing VOR at different gaze depths, and explored the relationship between gaze depth and VOR motion through a simple user study. Then we designed a new temporal neural network GDENet (Gaze Depth Estimation Network), which adopts a new label representation combining classification and regression, and uses multiple supervision to predict the gaze depth from input VOR information. In the range of 0.2 m to 5 m, we achieve centimeter-level prediction accuracy (MAE = 0.145 m, MSE = 0.031 m). Finally, a user experiment indicates that our depth estimation method can be used in 3D target disambiguation and is suitable for various 3D scenarios.</div></div>\",\"PeriodicalId\":50570,\"journal\":{\"name\":\"Displays\",\"volume\":\"87 \",\"pages\":\"Article 102978\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-01-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Displays\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0141938225000150\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225000150","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
摘要
注视深度估计是三维目标遮挡下基于眼睛的选择的关键。以往的工作对注视深度估计做了一些研究,但精度有限。在这项工作中,我们提出了一种基于VOR (vestibuo - ocular Reflex,前庭眼反射)的新方法,该方法是基于这样的事实:当头部运动时,VOR通过在相反方向上进行代偿性眼动来帮助我们稳定对目标的注视,并且越靠近目标,需要的补偿越多。在这项工作中,我们收集了在不同凝视深度下进行VOR时的大量头部和眼睛数据,并通过简单的用户研究来探索凝视深度与VOR运动之间的关系。然后设计了一种新的时间神经网络GDENet (Gaze Depth Estimation network),该网络采用分类与回归相结合的新型标签表示,利用多重监督从输入的VOR信息中预测凝视深度。在0.2 m ~ 5 m范围内,我们实现了厘米级的预测精度(MAE = 0.145 m, MSE = 0.031 m)。最后,用户实验表明,我们的深度估计方法可用于三维目标消歧,适用于各种三维场景。
Gaze depth estimation using vestibulo-ocular reflex and GDENet for 3D target disambiguation
Gaze depth estimation is crucial in eye-based selection with 3D target occlusion. Previous work has done some research on gaze depth estimation, but the accuracy is limited. In this work, we propose a new method based on VOR (Vestibulo-Ocular Reflex), which is based on the fact that VOR helps us stabilize gaze on a target when the head moves by performing compensatory eye movement in the opposite direction, and the closer the target is, the more compensation is needed. In this work, we collected abundant head and eye data when performing VOR at different gaze depths, and explored the relationship between gaze depth and VOR motion through a simple user study. Then we designed a new temporal neural network GDENet (Gaze Depth Estimation Network), which adopts a new label representation combining classification and regression, and uses multiple supervision to predict the gaze depth from input VOR information. In the range of 0.2 m to 5 m, we achieve centimeter-level prediction accuracy (MAE = 0.145 m, MSE = 0.031 m). Finally, a user experiment indicates that our depth estimation method can be used in 3D target disambiguation and is suitable for various 3D scenarios.
期刊介绍:
Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface.
Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.