VL4Pose: Active Learning Through Out-Of-Distribution Detection For Pose Estimation

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference Pub Date : 2022-10-12 DOI:10.48550/arXiv.2210.06028

Megh Shukla, Roshan Roy, Pankaj Singh, Shuaib Ahmed, Alexandre Alahi

{"title":"VL4Pose: Active Learning Through Out-Of-Distribution Detection For Pose Estimation","authors":"Megh Shukla, Roshan Roy, Pankaj Singh, Shuaib Ahmed, Alexandre Alahi","doi":"10.48550/arXiv.2210.06028","DOIUrl":null,"url":null,"abstract":"Advances in computing have enabled widespread access to pose estimation, creating new sources of data streams. Unlike mock set-ups for data collection, tapping into these data streams through on-device active learning allows us to directly sample from the real world to improve the spread of the training distribution. However, on-device computing power is limited, implying that any candidate active learning algorithm should have a low compute footprint while also being reliable. Although multiple algorithms cater to pose estimation, they either use extensive compute to power state-of-the-art results or are not competitive in low-resource settings. We address this limitation with VL4Pose (Visual Likelihood For Pose Estimation), a first principles approach for active learning through out-of-distribution detection. We begin with a simple premise: pose estimators often predict incoherent poses for out-of-distribution samples. Hence, can we identify a distribution of poses the model has been trained on, to identify incoherent poses the model is unsure of? Our solution involves modelling the pose through a simple parametric Bayesian network trained via maximum likelihood estimation. Therefore, poses incurring a low likelihood within our framework are out-of-distribution samples making them suitable candidates for annotation. We also observe two useful side-outcomes: VL4Pose in-principle yields better uncertainty estimates by unifying joint and pose level ambiguity, as well as the unintentional but welcome ability of VL4Pose to perform pose refinement in limited scenarios. We perform qualitative and quantitative experiments on three datasets: MPII, LSP and ICVL, spanning human and hand pose estimation. Finally, we note that VL4Pose is simple, computationally inexpensive and competitive, making it suitable for challenging tasks such as on-device active learning.","PeriodicalId":72437,"journal":{"name":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","volume":"6 1","pages":"610"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2210.06028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Advances in computing have enabled widespread access to pose estimation, creating new sources of data streams. Unlike mock set-ups for data collection, tapping into these data streams through on-device active learning allows us to directly sample from the real world to improve the spread of the training distribution. However, on-device computing power is limited, implying that any candidate active learning algorithm should have a low compute footprint while also being reliable. Although multiple algorithms cater to pose estimation, they either use extensive compute to power state-of-the-art results or are not competitive in low-resource settings. We address this limitation with VL4Pose (Visual Likelihood For Pose Estimation), a first principles approach for active learning through out-of-distribution detection. We begin with a simple premise: pose estimators often predict incoherent poses for out-of-distribution samples. Hence, can we identify a distribution of poses the model has been trained on, to identify incoherent poses the model is unsure of? Our solution involves modelling the pose through a simple parametric Bayesian network trained via maximum likelihood estimation. Therefore, poses incurring a low likelihood within our framework are out-of-distribution samples making them suitable candidates for annotation. We also observe two useful side-outcomes: VL4Pose in-principle yields better uncertainty estimates by unifying joint and pose level ambiguity, as well as the unintentional but welcome ability of VL4Pose to perform pose refinement in limited scenarios. We perform qualitative and quantitative experiments on three datasets: MPII, LSP and ICVL, spanning human and hand pose estimation. Finally, we note that VL4Pose is simple, computationally inexpensive and competitive, making it suitable for challenging tasks such as on-device active learning.

查看原文本刊更多论文

VL4Pose:基于分布外检测的姿态估计主动学习

计算机技术的进步使姿态估计得以广泛使用，创造了新的数据流来源。与数据收集的模拟设置不同，通过设备上的主动学习进入这些数据流使我们能够直接从现实世界中采样，以提高训练分布的传播。然而，设备上的计算能力是有限的，这意味着任何候选的主动学习算法都应该具有较低的计算占用，同时也要可靠。虽然有多种算法迎合姿态估计，但它们要么使用大量的计算来支持最先进的结果，要么在低资源环境下没有竞争力。我们使用VL4Pose(视觉似然姿态估计)解决了这一限制，这是一种通过分布外检测进行主动学习的第一原理方法。我们从一个简单的前提开始:姿态估计器经常预测分布外样本的不连贯姿态。因此，我们能否识别模型所训练的姿势分布，以识别模型不确定的不连贯姿势?我们的解决方案包括通过最大似然估计训练的简单参数贝叶斯网络对姿态建模。因此，在我们的框架内产生低可能性的姿势是分布外样本，使它们适合进行注释。我们还观察到两个有用的副作用:原则上，VL4Pose通过统一关节和位姿级别模糊产生更好的不确定性估计，以及VL4Pose在有限场景中执行位姿优化的无意但受欢迎的能力。我们在MPII、LSP和ICVL三个数据集上进行了定性和定量实验，涵盖了人体和手部姿态估计。最后，我们注意到VL4Pose简单，计算成本低且具有竞争力，使其适合具有挑战性的任务，例如设备上的主动学习。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMVC : proceedings of the British Machine Vision Conference. British Machine Vision Conference

自引率

0.00%

发文量