Weakly Supervised Attended Object Detection Using Gaze Data as Annotations

Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing Pub Date : 2022-04-14 DOI:10.48550/arXiv.2204.07090

Michele Mazzamuto, F. Ragusa, Antonino Furnari, G. Signorello, G. Farinella

{"title":"Weakly Supervised Attended Object Detection Using Gaze Data as Annotations","authors":"Michele Mazzamuto, F. Ragusa, Antonino Furnari, G. Signorello, G. Farinella","doi":"10.48550/arXiv.2204.07090","DOIUrl":null,"url":null,"abstract":"We consider the problem of detecting and recognizing the objects observed by visitors (i.e., attended objects) in cultural sites from egocentric vision. A standard approach to the problem involves detecting all objects and selecting the one which best overlaps with the gaze of the visitor, measured through a gaze tracker. Since labeling large amounts of data to train a standard object detector is expensive in terms of costs and time, we propose a weakly supervised version of the task which leans only on gaze data and a frame-level label indicating the class of the attended object. To study the problem, we present a new dataset composed of egocentric videos and gaze coordinates of subjects visiting a museum. We hence compare three different baselines for weakly supervised attended object detection on the collected data. Results show that the considered approaches achieve satisfactory performance in a weakly supervised manner, which allows for significant time savings with respect to a fully supervised detector based on Faster R-CNN. To encourage research on the topic, we publicly release the code and the dataset at the following url: https://iplab.dmi.unict.it/WS_OBJ_DET/","PeriodicalId":74527,"journal":{"name":"Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing","volume":"78 1","pages":"263-274"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2204.07090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

We consider the problem of detecting and recognizing the objects observed by visitors (i.e., attended objects) in cultural sites from egocentric vision. A standard approach to the problem involves detecting all objects and selecting the one which best overlaps with the gaze of the visitor, measured through a gaze tracker. Since labeling large amounts of data to train a standard object detector is expensive in terms of costs and time, we propose a weakly supervised version of the task which leans only on gaze data and a frame-level label indicating the class of the attended object. To study the problem, we present a new dataset composed of egocentric videos and gaze coordinates of subjects visiting a museum. We hence compare three different baselines for weakly supervised attended object detection on the collected data. Results show that the considered approaches achieve satisfactory performance in a weakly supervised manner, which allows for significant time savings with respect to a fully supervised detector based on Faster R-CNN. To encourage research on the topic, we publicly release the code and the dataset at the following url: https://iplab.dmi.unict.it/WS_OBJ_DET/

查看原文本刊更多论文

使用注视数据作为注释的弱监督参与对象检测

我们从自我中心的视角来考虑文化场所中游客观察到的物体(即被关注的物体)的检测和识别问题。解决这个问题的标准方法包括检测所有物体，并选择一个与访问者的目光最重叠的物体，通过凝视跟踪器进行测量。由于标记大量数据来训练标准目标检测器在成本和时间上都是昂贵的，我们提出了一个弱监督版本的任务，它只依赖于注视数据和一个指示被关注对象类别的帧级标签。为了研究这个问题，我们提出了一个新的数据集，该数据集由以自我为中心的视频和参观博物馆的受试者的凝视坐标组成。因此，我们比较了三种不同的基线弱监督出席对象检测收集的数据。结果表明，所考虑的方法在弱监督方式下取得了令人满意的性能，相对于基于Faster R-CNN的完全监督检测器，可以节省大量时间。为了鼓励对该主题的研究，我们在以下url上公开发布代码和数据集:https://iplab.dmi.unict.it/WS_OBJ_DET/

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing

自引率

0.00%

发文量