{"title":"OSTGazeNet:单阶段可训练的二维凝视估计网络","authors":"Heeyoung Joo, Min-Soo Ko, Hyok Song","doi":"10.1109/ICTC52510.2021.9620812","DOIUrl":null,"url":null,"abstract":"Gaze estimation refers to estimating the user's gaze information, such as the gaze direction. Recently, various deep learning-based methods for gaze estimation which are robust to lighting conditions or occlusions have been introduced. Previously proposed methods for gaze estimation were mainly composed of 2 different steps, one is for localizing the eye landmarks and another is for regressing the gaze direction. In this paper, we propose a novel one-stage trainable 2D gaze estimation network, namely One-stage Trainable 2D Gaze Estimation Network(OSTGazeNet), in which the localization of eye landmarks and the regression of the 2D gaze direction vector are integrated into the one-stage trainable deep learning network. OSTGazeNet used Stacked Hourglass Network as a backbone network, and the pixel coordinates of eye landmarks in 2D image space and a normalized gaze direction vector in the spherical coordinate system are estimated simultaneously in OSTGazeNet. About the learning of the network, we used synthetic eye images dataset named UnityEyes for training and also used an unconstrained eye images dataset named MPIIGaze for the evaluation. We performed experiments to determine the hyperparameters of learning and used mean square error as a performance metric. The best performance was a mean square error of 0.038838, and the inference time was 42 FPS.","PeriodicalId":299175,"journal":{"name":"2021 International Conference on Information and Communication Technology Convergence (ICTC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"OSTGazeNet: One-stage Trainable 2D Gaze Estimation Network\",\"authors\":\"Heeyoung Joo, Min-Soo Ko, Hyok Song\",\"doi\":\"10.1109/ICTC52510.2021.9620812\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Gaze estimation refers to estimating the user's gaze information, such as the gaze direction. Recently, various deep learning-based methods for gaze estimation which are robust to lighting conditions or occlusions have been introduced. Previously proposed methods for gaze estimation were mainly composed of 2 different steps, one is for localizing the eye landmarks and another is for regressing the gaze direction. In this paper, we propose a novel one-stage trainable 2D gaze estimation network, namely One-stage Trainable 2D Gaze Estimation Network(OSTGazeNet), in which the localization of eye landmarks and the regression of the 2D gaze direction vector are integrated into the one-stage trainable deep learning network. OSTGazeNet used Stacked Hourglass Network as a backbone network, and the pixel coordinates of eye landmarks in 2D image space and a normalized gaze direction vector in the spherical coordinate system are estimated simultaneously in OSTGazeNet. About the learning of the network, we used synthetic eye images dataset named UnityEyes for training and also used an unconstrained eye images dataset named MPIIGaze for the evaluation. We performed experiments to determine the hyperparameters of learning and used mean square error as a performance metric. The best performance was a mean square error of 0.038838, and the inference time was 42 FPS.\",\"PeriodicalId\":299175,\"journal\":{\"name\":\"2021 International Conference on Information and Communication Technology Convergence (ICTC)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Information and Communication Technology Convergence (ICTC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTC52510.2021.9620812\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information and Communication Technology Convergence (ICTC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTC52510.2021.9620812","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Gaze estimation refers to estimating the user's gaze information, such as the gaze direction. Recently, various deep learning-based methods for gaze estimation which are robust to lighting conditions or occlusions have been introduced. Previously proposed methods for gaze estimation were mainly composed of 2 different steps, one is for localizing the eye landmarks and another is for regressing the gaze direction. In this paper, we propose a novel one-stage trainable 2D gaze estimation network, namely One-stage Trainable 2D Gaze Estimation Network(OSTGazeNet), in which the localization of eye landmarks and the regression of the 2D gaze direction vector are integrated into the one-stage trainable deep learning network. OSTGazeNet used Stacked Hourglass Network as a backbone network, and the pixel coordinates of eye landmarks in 2D image space and a normalized gaze direction vector in the spherical coordinate system are estimated simultaneously in OSTGazeNet. About the learning of the network, we used synthetic eye images dataset named UnityEyes for training and also used an unconstrained eye images dataset named MPIIGaze for the evaluation. We performed experiments to determine the hyperparameters of learning and used mean square error as a performance metric. The best performance was a mean square error of 0.038838, and the inference time was 42 FPS.