Training Person-Specific Gaze Estimators from User Interactions with Multiple Devices

Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems Pub Date : 2018-04-21 DOI:10.1145/3173574.3174198

Xucong Zhang, Michael Xuelin Huang, Yusuke Sugano, A. Bulling

{"title":"Training Person-Specific Gaze Estimators from User Interactions with Multiple Devices","authors":"Xucong Zhang, Michael Xuelin Huang, Yusuke Sugano, A. Bulling","doi":"10.1145/3173574.3174198","DOIUrl":null,"url":null,"abstract":"Learning-based gaze estimation has significant potential to enable attentive user interfaces and gaze-based interaction on the billions of camera-equipped handheld devices and ambient displays. While training accurate person- and device-independent gaze estimators remains challenging, person-specific training is feasible but requires tedious data collection for each target device. To address these limitations, we present the first method to train person-specific gaze estimators across multiple devices. At the core of our method is a single convolutional neural network with shared feature extraction layers and device-specific branches that we train from face images and corresponding on-screen gaze locations. Detailed evaluations on a new dataset of interactions with five common devices (mobile phone, tablet, laptop, desktop computer, smart TV) and three common applications (mobile game, text editing, media center) demonstrate the significant potential of cross-device training. We further explore training with gaze locations derived from natural interactions, such as mouse or touch input.","PeriodicalId":20512,"journal":{"name":"Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems","volume":"38 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"50","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3173574.3174198","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 50

Abstract

Learning-based gaze estimation has significant potential to enable attentive user interfaces and gaze-based interaction on the billions of camera-equipped handheld devices and ambient displays. While training accurate person- and device-independent gaze estimators remains challenging, person-specific training is feasible but requires tedious data collection for each target device. To address these limitations, we present the first method to train person-specific gaze estimators across multiple devices. At the core of our method is a single convolutional neural network with shared feature extraction layers and device-specific branches that we train from face images and corresponding on-screen gaze locations. Detailed evaluations on a new dataset of interactions with five common devices (mobile phone, tablet, laptop, desktop computer, smart TV) and three common applications (mobile game, text editing, media center) demonstrate the significant potential of cross-device training. We further explore training with gaze locations derived from natural interactions, such as mouse or touch input.

查看原文本刊更多论文

从用户与多个设备的交互中训练特定于人的注视估计器

基于学习的凝视估计具有巨大的潜力，可以在数十亿配备摄像头的手持设备和环境显示器上实现专注的用户界面和基于凝视的交互。虽然训练准确的独立于人和设备的注视估计器仍然具有挑战性，但针对个人的训练是可行的，但需要为每个目标设备收集繁琐的数据。为了解决这些限制，我们提出了第一种跨多个设备训练特定于人的注视估计器的方法。我们方法的核心是一个单一的卷积神经网络，它具有共享的特征提取层和特定于设备的分支，我们从人脸图像和相应的屏幕注视位置中训练这些分支。对五种常见设备(手机、平板电脑、笔记本电脑、台式电脑、智能电视)和三种常见应用程序(手机游戏、文本编辑、媒体中心)交互的新数据集的详细评估显示了跨设备培训的巨大潜力。我们进一步探索了来自自然交互(如鼠标或触摸输入)的凝视位置的训练。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems

自引率

0.00%

发文量