A. Adiba, Satoshi Asatani, Seiichi Tagawa, H. Niioka, Jun Miyake
{"title":"基于卷积神经网络的三维空间凝视跟踪","authors":"A. Adiba, Satoshi Asatani, Seiichi Tagawa, H. Niioka, Jun Miyake","doi":"10.1109/AIPR.2017.8457962","DOIUrl":null,"url":null,"abstract":"This paper presents integrated architecture to estimate gaze vectors under unrestricted head motions. Since previous approaches focused on estimating gaze toward a small planar screen, calibration is needed prior to use. With a Kinect device, we develop a method that relies on depth sensing to obtain robust and accurate head pose tracking and obtain the eye-in-head gaze direction information by training the visual data from eye images with a Neural Network (NN) model. Our model uses a Convolution Neural Network (CNN) that has five layers: two sets of convolution-pooling pairs and a fully connected-output layer. The filters are taken from the random patches of the images in an unsupervised way by k-means clustering. The learned filters are fed to a convolution layer, each of which is followed by a pooling layer, to reduce the resolution of the feature map and the sensitivity of the output to the shifts and the distortions. In the end, fully connected layers can be used as a classifier with a feed-forward-based process to obtain the weight. We reconstruct the gaze vectors from a set of head and eye pose orientations. The results of this approach suggest that the gaze estimation error is 5 degrees. This model is more accurate than a simple NN and an adaptive linear regression (ALR) approach.","PeriodicalId":128779,"journal":{"name":"2017 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Gaze Tracking in 3D Space with a Convolution Neural Network “See What I See”\",\"authors\":\"A. Adiba, Satoshi Asatani, Seiichi Tagawa, H. Niioka, Jun Miyake\",\"doi\":\"10.1109/AIPR.2017.8457962\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents integrated architecture to estimate gaze vectors under unrestricted head motions. Since previous approaches focused on estimating gaze toward a small planar screen, calibration is needed prior to use. With a Kinect device, we develop a method that relies on depth sensing to obtain robust and accurate head pose tracking and obtain the eye-in-head gaze direction information by training the visual data from eye images with a Neural Network (NN) model. Our model uses a Convolution Neural Network (CNN) that has five layers: two sets of convolution-pooling pairs and a fully connected-output layer. The filters are taken from the random patches of the images in an unsupervised way by k-means clustering. The learned filters are fed to a convolution layer, each of which is followed by a pooling layer, to reduce the resolution of the feature map and the sensitivity of the output to the shifts and the distortions. In the end, fully connected layers can be used as a classifier with a feed-forward-based process to obtain the weight. We reconstruct the gaze vectors from a set of head and eye pose orientations. The results of this approach suggest that the gaze estimation error is 5 degrees. This model is more accurate than a simple NN and an adaptive linear regression (ALR) approach.\",\"PeriodicalId\":128779,\"journal\":{\"name\":\"2017 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AIPR.2017.8457962\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIPR.2017.8457962","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Gaze Tracking in 3D Space with a Convolution Neural Network “See What I See”
This paper presents integrated architecture to estimate gaze vectors under unrestricted head motions. Since previous approaches focused on estimating gaze toward a small planar screen, calibration is needed prior to use. With a Kinect device, we develop a method that relies on depth sensing to obtain robust and accurate head pose tracking and obtain the eye-in-head gaze direction information by training the visual data from eye images with a Neural Network (NN) model. Our model uses a Convolution Neural Network (CNN) that has five layers: two sets of convolution-pooling pairs and a fully connected-output layer. The filters are taken from the random patches of the images in an unsupervised way by k-means clustering. The learned filters are fed to a convolution layer, each of which is followed by a pooling layer, to reduce the resolution of the feature map and the sensitivity of the output to the shifts and the distortions. In the end, fully connected layers can be used as a classifier with a feed-forward-based process to obtain the weight. We reconstruct the gaze vectors from a set of head and eye pose orientations. The results of this approach suggest that the gaze estimation error is 5 degrees. This model is more accurate than a simple NN and an adaptive linear regression (ALR) approach.