OSTGazeNet: One-stage Trainable 2D Gaze Estimation Network

Heeyoung Joo, Min-Soo Ko, Hyok Song
{"title":"OSTGazeNet: One-stage Trainable 2D Gaze Estimation Network","authors":"Heeyoung Joo, Min-Soo Ko, Hyok Song","doi":"10.1109/ICTC52510.2021.9620812","DOIUrl":null,"url":null,"abstract":"Gaze estimation refers to estimating the user's gaze information, such as the gaze direction. Recently, various deep learning-based methods for gaze estimation which are robust to lighting conditions or occlusions have been introduced. Previously proposed methods for gaze estimation were mainly composed of 2 different steps, one is for localizing the eye landmarks and another is for regressing the gaze direction. In this paper, we propose a novel one-stage trainable 2D gaze estimation network, namely One-stage Trainable 2D Gaze Estimation Network(OSTGazeNet), in which the localization of eye landmarks and the regression of the 2D gaze direction vector are integrated into the one-stage trainable deep learning network. OSTGazeNet used Stacked Hourglass Network as a backbone network, and the pixel coordinates of eye landmarks in 2D image space and a normalized gaze direction vector in the spherical coordinate system are estimated simultaneously in OSTGazeNet. About the learning of the network, we used synthetic eye images dataset named UnityEyes for training and also used an unconstrained eye images dataset named MPIIGaze for the evaluation. We performed experiments to determine the hyperparameters of learning and used mean square error as a performance metric. The best performance was a mean square error of 0.038838, and the inference time was 42 FPS.","PeriodicalId":299175,"journal":{"name":"2021 International Conference on Information and Communication Technology Convergence (ICTC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information and Communication Technology Convergence (ICTC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTC52510.2021.9620812","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Gaze estimation refers to estimating the user's gaze information, such as the gaze direction. Recently, various deep learning-based methods for gaze estimation which are robust to lighting conditions or occlusions have been introduced. Previously proposed methods for gaze estimation were mainly composed of 2 different steps, one is for localizing the eye landmarks and another is for regressing the gaze direction. In this paper, we propose a novel one-stage trainable 2D gaze estimation network, namely One-stage Trainable 2D Gaze Estimation Network(OSTGazeNet), in which the localization of eye landmarks and the regression of the 2D gaze direction vector are integrated into the one-stage trainable deep learning network. OSTGazeNet used Stacked Hourglass Network as a backbone network, and the pixel coordinates of eye landmarks in 2D image space and a normalized gaze direction vector in the spherical coordinate system are estimated simultaneously in OSTGazeNet. About the learning of the network, we used synthetic eye images dataset named UnityEyes for training and also used an unconstrained eye images dataset named MPIIGaze for the evaluation. We performed experiments to determine the hyperparameters of learning and used mean square error as a performance metric. The best performance was a mean square error of 0.038838, and the inference time was 42 FPS.
OSTGazeNet:单阶段可训练的二维凝视估计网络
注视估计是指对用户注视方向等注视信息进行估计。近年来,各种基于深度学习的凝视估计方法被引入,这些方法对光照条件或遮挡具有鲁棒性。以往提出的凝视估计方法主要由两个不同的步骤组成,一个是眼睛标志的定位,另一个是凝视方向的回归。本文提出了一种新的一阶段可训练的二维凝视估计网络,即OSTGazeNet (one-stage trainable 2D gaze estimation network),该网络将眼部标志的定位和二维凝视方向向量的回归集成到一阶段可训练的深度学习网络中。OSTGazeNet采用堆叠沙漏网络作为主干网络,在OSTGazeNet中同时估计二维图像空间中眼睛地标的像素坐标和球坐标系中归一化凝视方向向量。在网络的学习方面,我们使用合成眼图像数据集UnityEyes进行训练,并使用无约束眼图像数据集MPIIGaze进行评估。我们进行了实验来确定学习的超参数,并使用均方误差作为性能度量。最佳性能是均方误差为0.038838,推理时间为42 FPS。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信