{"title":"Gaze depth estimation using vestibulo-ocular reflex and GDENet for 3D target disambiguation","authors":"Ting Lei, Leshan Wang, Jixiang Chen","doi":"10.1016/j.displa.2025.102978","DOIUrl":null,"url":null,"abstract":"<div><div>Gaze depth estimation is crucial in eye-based selection with 3D target occlusion. Previous work has done some research on gaze depth estimation, but the accuracy is limited. In this work, we propose a new method based on VOR (Vestibulo-Ocular Reflex), which is based on the fact that VOR helps us stabilize gaze on a target when the head moves by performing compensatory eye movement in the opposite direction, and the closer the target is, the more compensation is needed. In this work, we collected abundant head and eye data when performing VOR at different gaze depths, and explored the relationship between gaze depth and VOR motion through a simple user study. Then we designed a new temporal neural network GDENet (Gaze Depth Estimation Network), which adopts a new label representation combining classification and regression, and uses multiple supervision to predict the gaze depth from input VOR information. In the range of 0.2 m to 5 m, we achieve centimeter-level prediction accuracy (MAE = 0.145 m, MSE = 0.031 m). Finally, a user experiment indicates that our depth estimation method can be used in 3D target disambiguation and is suitable for various 3D scenarios.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102978"},"PeriodicalIF":3.7000,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225000150","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Gaze depth estimation is crucial in eye-based selection with 3D target occlusion. Previous work has done some research on gaze depth estimation, but the accuracy is limited. In this work, we propose a new method based on VOR (Vestibulo-Ocular Reflex), which is based on the fact that VOR helps us stabilize gaze on a target when the head moves by performing compensatory eye movement in the opposite direction, and the closer the target is, the more compensation is needed. In this work, we collected abundant head and eye data when performing VOR at different gaze depths, and explored the relationship between gaze depth and VOR motion through a simple user study. Then we designed a new temporal neural network GDENet (Gaze Depth Estimation Network), which adopts a new label representation combining classification and regression, and uses multiple supervision to predict the gaze depth from input VOR information. In the range of 0.2 m to 5 m, we achieve centimeter-level prediction accuracy (MAE = 0.145 m, MSE = 0.031 m). Finally, a user experiment indicates that our depth estimation method can be used in 3D target disambiguation and is suitable for various 3D scenarios.
期刊介绍:
Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface.
Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.