Solid and Effective Upper Limb Segmentation in Egocentric Vision

The 26th International Conference on 3D Web Technology Pub Date : 2021-11-08 DOI:10.1145/3485444.3495179

M. Gruosso, N. Capece, U. Erra

{"title":"Solid and Effective Upper Limb Segmentation in Egocentric Vision","authors":"M. Gruosso, N. Capece, U. Erra","doi":"10.1145/3485444.3495179","DOIUrl":null,"url":null,"abstract":"Upper limb segmentation in egocentric vision is a challenging and nearly unexplored task that extends the well-known hand localization problem and can be crucial for a realistic representation of users’ limbs in immersive and interactive environments, such as VR/MR applications designed for web browsers that are a general-purpose solution suitable for any device. Existing hand and arm segmentation approaches require a large amount of well-annotated data. Then different annotation techniques were designed, and several datasets were created. Such datasets are often limited to synthetic and semi-synthetic data that do not include the whole limb and differ significantly from real data, leading to poor performance in many realistic cases. To overcome the limitations of previous methods and the challenges inherent in both egocentric vision and segmentation, we trained several segmentation networks based on the state-of-the-art DeepLabv3+ model, collecting a large-scale comprehensive dataset. It consists of 46 thousand real-life and well-labeled RGB images with a great variety of skin colors, clothes, occlusions, and lighting conditions. In particular, we carefully selected the best data from existing datasets and added our EgoCam dataset, which includes new images with accurate labels. Finally, we extensively evaluated the trained networks in unconstrained real-world environments to find the best model configuration for this task, achieving promising and remarkable results in diverse scenarios. The code, the collected egocentric upper limb segmentation dataset, and a video demo of our work will be available on the project page1.","PeriodicalId":362468,"journal":{"name":"The 26th International Conference on 3D Web Technology","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 26th International Conference on 3D Web Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3485444.3495179","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Upper limb segmentation in egocentric vision is a challenging and nearly unexplored task that extends the well-known hand localization problem and can be crucial for a realistic representation of users’ limbs in immersive and interactive environments, such as VR/MR applications designed for web browsers that are a general-purpose solution suitable for any device. Existing hand and arm segmentation approaches require a large amount of well-annotated data. Then different annotation techniques were designed, and several datasets were created. Such datasets are often limited to synthetic and semi-synthetic data that do not include the whole limb and differ significantly from real data, leading to poor performance in many realistic cases. To overcome the limitations of previous methods and the challenges inherent in both egocentric vision and segmentation, we trained several segmentation networks based on the state-of-the-art DeepLabv3+ model, collecting a large-scale comprehensive dataset. It consists of 46 thousand real-life and well-labeled RGB images with a great variety of skin colors, clothes, occlusions, and lighting conditions. In particular, we carefully selected the best data from existing datasets and added our EgoCam dataset, which includes new images with accurate labels. Finally, we extensively evaluated the trained networks in unconstrained real-world environments to find the best model configuration for this task, achieving promising and remarkable results in diverse scenarios. The code, the collected egocentric upper limb segmentation dataset, and a video demo of our work will be available on the project page1.

查看原文本刊更多论文

自我中心视觉中可靠有效的上肢分割

自我中心视觉中的上肢分割是一项具有挑战性且几乎未被探索的任务，它扩展了众所周知的手部定位问题，对于在沉浸式和交互式环境中逼真地表示用户的肢体至关重要，例如为web浏览器设计的VR/MR应用程序是适用于任何设备的通用解决方案。现有的手和手臂分割方法需要大量注释良好的数据。然后设计了不同的标注技术，并创建了多个数据集。这些数据集通常仅限于合成和半合成数据，不包括整个肢体，与真实数据差异很大，导致在许多实际情况下性能不佳。为了克服以往方法的局限性以及自我中心视觉和分割中固有的挑战，我们基于最先进的DeepLabv3+模型训练了几个分割网络，收集了一个大规模的综合数据集。它由4.6万张真实的、标记良好的RGB图像组成，这些图像具有各种各样的肤色、衣服、遮挡和照明条件。特别是，我们从现有数据集中仔细选择了最好的数据，并添加了我们的EgoCam数据集，其中包括具有准确标签的新图像。最后，我们在无约束的现实世界环境中广泛评估了训练好的网络，以找到该任务的最佳模型配置，在不同的场景中取得了有希望和显著的结果。代码，收集的以自我为中心的上肢分割数据集，以及我们工作的视频演示将在项目页面1上提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The 26th International Conference on 3D Web Technology

自引率

0.00%

发文量