Hand-Priming in Object Localization for Assistive Egocentric Vision.

IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision Pub Date : 2020-03-01 Epub Date: 2020-05-14 DOI:10.1109/wacv45572.2020.9093353

Kyungjun Lee, Abhinav Shrivastava, Hernisa Kacorri

{"title":"Hand-Priming in Object Localization for Assistive Egocentric Vision.","authors":"Kyungjun Lee, Abhinav Shrivastava, Hernisa Kacorri","doi":"10.1109/wacv45572.2020.9093353","DOIUrl":null,"url":null,"abstract":"<p><p>Egocentric vision holds great promises for increasing access to visual information and improving the quality of life for people with visual impairments, with object recognition being one of the daily challenges for this population. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in the frame due to challenges in camera aiming without visual feedback. Also, gaze information, commonly used to infer the area of interest in egocentric vision, is often not dependable. However, blind users often tend to include their hand either interacting with the object that they wish to recognize or simply placing it in proximity for better camera aiming. We propose localization models that leverage the presence of the hand as the contextual information for priming the center area of the object of interest. In our approach, hand segmentation is fed to either the entire localization network or its last convolutional layers. Using egocentric datasets from sighted and blind individuals, we show that the hand-priming achieves higher precision than other approaches, such as fine-tuning, multi-class, and multi-task learning, which also encode hand-object interactions in localization.</p>","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"2020 ","pages":"3411-3421"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7423407/pdf/nihms-1609047.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/wacv45572.2020.9093353","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/5/14 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Egocentric vision holds great promises for increasing access to visual information and improving the quality of life for people with visual impairments, with object recognition being one of the daily challenges for this population. While we strive to improve recognition performance, it remains difficult to identify which object is of interest to the user; the object may not even be included in the frame due to challenges in camera aiming without visual feedback. Also, gaze information, commonly used to infer the area of interest in egocentric vision, is often not dependable. However, blind users often tend to include their hand either interacting with the object that they wish to recognize or simply placing it in proximity for better camera aiming. We propose localization models that leverage the presence of the hand as the contextual information for priming the center area of the object of interest. In our approach, hand segmentation is fed to either the entire localization network or its last convolutional layers. Using egocentric datasets from sighted and blind individuals, we show that the hand-priming achieves higher precision than other approaches, such as fine-tuning, multi-class, and multi-task learning, which also encode hand-object interactions in localization.

查看原文本刊更多论文

辅助性眼心视觉物体定位中的手部定位

以自我为中心的视觉在增加视觉信息的获取和改善视障人士的生活质量方面大有可为，而物体识别则是视障人士面临的日常挑战之一。在我们努力提高识别性能的同时，识别用户感兴趣的物体仍然很困难；在没有视觉反馈的情况下，由于相机瞄准方面的挑战，物体甚至可能不在画面中。此外，在自我中心视觉中通常用于推断感兴趣区域的注视信息往往并不可靠。然而，盲人用户通常倾向于将他们的手与他们希望识别的物体进行互动，或者只是将其放在附近以更好地瞄准摄像机。我们提出的定位模型可以利用手的存在作为背景信息，以引出感兴趣物体的中心区域。在我们的方法中，手部分割信息被输入到整个定位网络或其最后的卷积层中。通过使用视力正常和失明人士的自我中心数据集，我们证明了手部定位比其他方法（如微调、多类和多任务学习）实现了更高的精确度，其他方法也在定位中编码了手与物体之间的相互作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision

自引率

0.00%

发文量