An egocentric video and eye-tracking dataset for visual search in convenience stores

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Vision and Image Understanding Pub Date : 2024-08-28 DOI:10.1016/j.cviu.2024.104129

Yinan Wang, Sansitha Panchadsaram, Rezvan Sherkati, James J. Clark

{"title":"An egocentric video and eye-tracking dataset for visual search in convenience stores","authors":"Yinan Wang, Sansitha Panchadsaram, Rezvan Sherkati, James J. Clark","doi":"10.1016/j.cviu.2024.104129","DOIUrl":null,"url":null,"abstract":"<div><p>We introduce an egocentric video and eye-tracking dataset, comprised of 108 first-person videos of 36 shoppers searching for three different products (orange juice, KitKat chocolate bars, and canned tuna) in a convenience store, along with the frame-centered eye fixation locations for each video frame. The dataset also includes demographic information about each participant in the form of an 11-question survey. The paper describes two applications using the dataset — an analysis of eye fixations during search in the store, and a training of a clustered saliency model for predicting saliency of viewers engaged in product search in the store. The fixation analysis shows that fixation duration statistics are very similar to those found in image and video viewing, suggesting that similar visual processing is employed during search in 3D environments and during viewing of imagery on computer screens. A clustering technique was applied to the questionnaire data, which resulted in two clusters being detected. Based on these clusters, personalized saliency prediction models were trained on the store fixation data, which provided improved performance in prediction saliency on the store video data compared to state-of-the art universal saliency prediction methods.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"248 ","pages":"Article 104129"},"PeriodicalIF":4.3000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1077314224002108/pdfft?md5=dfee816a569ed0f626ef6b190cabb0bc&pid=1-s2.0-S1077314224002108-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224002108","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

We introduce an egocentric video and eye-tracking dataset, comprised of 108 first-person videos of 36 shoppers searching for three different products (orange juice, KitKat chocolate bars, and canned tuna) in a convenience store, along with the frame-centered eye fixation locations for each video frame. The dataset also includes demographic information about each participant in the form of an 11-question survey. The paper describes two applications using the dataset — an analysis of eye fixations during search in the store, and a training of a clustered saliency model for predicting saliency of viewers engaged in product search in the store. The fixation analysis shows that fixation duration statistics are very similar to those found in image and video viewing, suggesting that similar visual processing is employed during search in 3D environments and during viewing of imagery on computer screens. A clustering technique was applied to the questionnaire data, which resulted in two clusters being detected. Based on these clusters, personalized saliency prediction models were trained on the store fixation data, which provided improved performance in prediction saliency on the store video data compared to state-of-the art universal saliency prediction methods.

查看原文本刊更多论文

用于便利店视觉搜索的自我中心视频和眼动跟踪数据集

我们介绍了一个以自我为中心的视频和眼动跟踪数据集，该数据集由 108 个第一人称视频组成，视频中的 36 名购物者在一家便利店中寻找三种不同的产品（橙汁、KitKat 巧克力棒和金枪鱼罐头），同时还包括每个视频帧的以帧为中心的眼球固定位置。数据集还包括以 11 个问题的调查形式提供的每位参与者的人口统计学信息。论文介绍了使用该数据集的两个应用--在商店搜索过程中的眼球定格分析，以及用于预测在商店中进行产品搜索的观众的显著性的聚类显著性模型的训练。定点分析表明，定点持续时间统计与图像和视频观看中的定点持续时间统计非常相似，这表明在三维环境中进行搜索和在计算机屏幕上观看图像时采用了类似的视觉处理方法。对问卷数据采用了聚类技术，结果发现了两个聚类。在这些聚类的基础上，对商店的固定数据进行了个性化的显著性预测模型训练，与最先进的通用显著性预测方法相比，这种方法在预测商店视频数据的显著性方面有更好的表现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems