Mixed Effects Neural Networks (MeNets) With Applications to Gaze Estimation

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2019-06-01 DOI:10.1109/CVPR.2019.00793

Yunyang Xiong, Hyunwoo Kim, Vikas Singh

{"title":"Mixed Effects Neural Networks (MeNets) With Applications to Gaze Estimation","authors":"Yunyang Xiong, Hyunwoo Kim, Vikas Singh","doi":"10.1109/CVPR.2019.00793","DOIUrl":null,"url":null,"abstract":"There is much interest in computer vision to utilize commodity hardware for gaze estimation. A number of papers have shown that algorithms based on deep convolutional architectures are approaching accuracies where streaming data from mass-market devices can offer good gaze tracking performance, although a gap still remains between what is possible and the performance users will expect in real deployments. We observe that one obvious avenue for improvement relates to a gap between some basic technical assumptions behind most existing approaches and the statistical properties of the data used for training. Specifically, most training datasets involve tens of users with a few hundreds (or more) repeated acquisitions per user. The non i.i.d. nature of this data suggests better estimation may be possible if the model explicitly made use of such “repeated measurements” from each user as is commonly done in classical statistical analysis using so-called mixed effects models. The goal of this paper is to adapt these “mixed effects” ideas from statistics within a deep neural network architecture for gaze estimation, based on eye images. Such a formulation seeks to specifically utilize information regarding the hierarchical structure of the training data — each node in the hierarchy is a user who provides tens or hundreds of repeated samples. This modification yields an architecture that offers state of the art performance on various publicly available datasets improving results by 10-20%.","PeriodicalId":6711,"journal":{"name":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"8 1","pages":"7735-7744"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"74","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2019.00793","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 74

Abstract

There is much interest in computer vision to utilize commodity hardware for gaze estimation. A number of papers have shown that algorithms based on deep convolutional architectures are approaching accuracies where streaming data from mass-market devices can offer good gaze tracking performance, although a gap still remains between what is possible and the performance users will expect in real deployments. We observe that one obvious avenue for improvement relates to a gap between some basic technical assumptions behind most existing approaches and the statistical properties of the data used for training. Specifically, most training datasets involve tens of users with a few hundreds (or more) repeated acquisitions per user. The non i.i.d. nature of this data suggests better estimation may be possible if the model explicitly made use of such “repeated measurements” from each user as is commonly done in classical statistical analysis using so-called mixed effects models. The goal of this paper is to adapt these “mixed effects” ideas from statistics within a deep neural network architecture for gaze estimation, based on eye images. Such a formulation seeks to specifically utilize information regarding the hierarchical structure of the training data — each node in the hierarchy is a user who provides tens or hundreds of repeated samples. This modification yields an architecture that offers state of the art performance on various publicly available datasets improving results by 10-20%.

查看原文本刊更多论文

混合效应神经网络(MeNets)及其在注视估计中的应用

利用商用硬件进行注视估计是计算机视觉研究的热点。许多论文表明，基于深度卷积架构的算法正在接近准确度，来自大众市场设备的流数据可以提供良好的注视跟踪性能，尽管在实际部署中可能的性能与用户期望的性能之间仍然存在差距。我们观察到，一个明显的改进途径与大多数现有方法背后的一些基本技术假设与用于训练的数据的统计属性之间的差距有关。具体来说，大多数训练数据集涉及几十个用户，每个用户有几百个(或更多)重复采集。这些数据的非i.i.d性质表明，如果模型明确地利用每个用户的“重复测量”，就像在使用所谓混合效应模型的经典统计分析中通常做的那样，可能会有更好的估计。本文的目标是将统计学中的这些“混合效应”思想应用于基于眼睛图像的深度神经网络架构中，用于凝视估计。这样的公式寻求专门利用关于训练数据的层次结构的信息——层次结构中的每个节点都是提供数十或数百个重复样本的用户。这种修改产生的架构可以在各种公开可用的数据集上提供最先进的性能，将结果提高10-20%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量