OSTGazeNet: One-stage Trainable 2D Gaze Estimation Network

2021 International Conference on Information and Communication Technology Convergence (ICTC) Pub Date : 2021-10-20 DOI:10.1109/ICTC52510.2021.9620812

Heeyoung Joo, Min-Soo Ko, Hyok Song

{"title":"OSTGazeNet: One-stage Trainable 2D Gaze Estimation Network","authors":"Heeyoung Joo, Min-Soo Ko, Hyok Song","doi":"10.1109/ICTC52510.2021.9620812","DOIUrl":null,"url":null,"abstract":"Gaze estimation refers to estimating the user's gaze information, such as the gaze direction. Recently, various deep learning-based methods for gaze estimation which are robust to lighting conditions or occlusions have been introduced. Previously proposed methods for gaze estimation were mainly composed of 2 different steps, one is for localizing the eye landmarks and another is for regressing the gaze direction. In this paper, we propose a novel one-stage trainable 2D gaze estimation network, namely One-stage Trainable 2D Gaze Estimation Network(OSTGazeNet), in which the localization of eye landmarks and the regression of the 2D gaze direction vector are integrated into the one-stage trainable deep learning network. OSTGazeNet used Stacked Hourglass Network as a backbone network, and the pixel coordinates of eye landmarks in 2D image space and a normalized gaze direction vector in the spherical coordinate system are estimated simultaneously in OSTGazeNet. About the learning of the network, we used synthetic eye images dataset named UnityEyes for training and also used an unconstrained eye images dataset named MPIIGaze for the evaluation. We performed experiments to determine the hyperparameters of learning and used mean square error as a performance metric. The best performance was a mean square error of 0.038838, and the inference time was 42 FPS.","PeriodicalId":299175,"journal":{"name":"2021 International Conference on Information and Communication Technology Convergence (ICTC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information and Communication Technology Convergence (ICTC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTC52510.2021.9620812","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Gaze estimation refers to estimating the user's gaze information, such as the gaze direction. Recently, various deep learning-based methods for gaze estimation which are robust to lighting conditions or occlusions have been introduced. Previously proposed methods for gaze estimation were mainly composed of 2 different steps, one is for localizing the eye landmarks and another is for regressing the gaze direction. In this paper, we propose a novel one-stage trainable 2D gaze estimation network, namely One-stage Trainable 2D Gaze Estimation Network(OSTGazeNet), in which the localization of eye landmarks and the regression of the 2D gaze direction vector are integrated into the one-stage trainable deep learning network. OSTGazeNet used Stacked Hourglass Network as a backbone network, and the pixel coordinates of eye landmarks in 2D image space and a normalized gaze direction vector in the spherical coordinate system are estimated simultaneously in OSTGazeNet. About the learning of the network, we used synthetic eye images dataset named UnityEyes for training and also used an unconstrained eye images dataset named MPIIGaze for the evaluation. We performed experiments to determine the hyperparameters of learning and used mean square error as a performance metric. The best performance was a mean square error of 0.038838, and the inference time was 42 FPS.

查看原文本刊更多论文

OSTGazeNet:单阶段可训练的二维凝视估计网络

注视估计是指对用户注视方向等注视信息进行估计。近年来，各种基于深度学习的凝视估计方法被引入，这些方法对光照条件或遮挡具有鲁棒性。以往提出的凝视估计方法主要由两个不同的步骤组成，一个是眼睛标志的定位，另一个是凝视方向的回归。本文提出了一种新的一阶段可训练的二维凝视估计网络，即OSTGazeNet (one-stage trainable 2D gaze estimation network)，该网络将眼部标志的定位和二维凝视方向向量的回归集成到一阶段可训练的深度学习网络中。OSTGazeNet采用堆叠沙漏网络作为主干网络，在OSTGazeNet中同时估计二维图像空间中眼睛地标的像素坐标和球坐标系中归一化凝视方向向量。在网络的学习方面，我们使用合成眼图像数据集UnityEyes进行训练，并使用无约束眼图像数据集MPIIGaze进行评估。我们进行了实验来确定学习的超参数，并使用均方误差作为性能度量。最佳性能是均方误差为0.038838，推理时间为42 FPS。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 International Conference on Information and Communication Technology Convergence (ICTC)

自引率

0.00%

发文量