SOMA: Solving Optical Marker-Based MoCap Automatically

2021 IEEE/CVF International Conference on Computer Vision (ICCV) Pub Date : 2021-10-01 DOI:10.1109/ICCV48922.2021.01093

N. Ghorbani, Michael J. Black

{"title":"SOMA: Solving Optical Marker-Based MoCap Automatically","authors":"N. Ghorbani, Michael J. Black","doi":"10.1109/ICCV48922.2021.01093","DOIUrl":null,"url":null,"abstract":"Marker-based optical motion capture (mocap) is the \"gold standard\" method for acquiring accurate 3D human motion in computer vision, medicine, and graphics. The raw output of these systems are noisy and incomplete 3D points or short tracklets of points. To be useful, one must associate these points with corresponding markers on the captured subject; i.e. \"labelling\". Given these labels, one can then \"solve\" for the 3D skeleton or body surface mesh. Commercial auto-labeling tools require a specific calibration procedure at capture time, which is not possible for archival data. Here we train a novel neural network called SOMA, which takes raw mocap point clouds with varying numbers of points, labels them at scale without any calibration data, independent of the capture technology, and requiring only minimal human intervention. Our key insight is that, while labeling point clouds is highly ambiguous, the 3D body provides strong constraints on the solution that can be exploited by a learning-based method. To enable learning, we generate massive training sets of simulated noisy and ground truth mocap markers animated by 3D bodies from AMASS. SOMA exploits an architecture with stacked self-attention elements to learn the spatial structure of the 3D body and an optimal transport layer to constrain the assignment (labeling) problem while rejecting outliers. We extensively evaluate SOMA both quantitatively and qualitatively. SOMA is more accurate and robust than existing state of the art research methods and can be applied where commercial systems cannot. We automatically label over 8 hours of archival mocap data across 4 different datasets captured using various technologies and output SMPL-X body models. The model and data is released for research purposes at https://soma.is.tue.mpg.de/.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"11097-11106"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV48922.2021.01093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

Marker-based optical motion capture (mocap) is the "gold standard" method for acquiring accurate 3D human motion in computer vision, medicine, and graphics. The raw output of these systems are noisy and incomplete 3D points or short tracklets of points. To be useful, one must associate these points with corresponding markers on the captured subject; i.e. "labelling". Given these labels, one can then "solve" for the 3D skeleton or body surface mesh. Commercial auto-labeling tools require a specific calibration procedure at capture time, which is not possible for archival data. Here we train a novel neural network called SOMA, which takes raw mocap point clouds with varying numbers of points, labels them at scale without any calibration data, independent of the capture technology, and requiring only minimal human intervention. Our key insight is that, while labeling point clouds is highly ambiguous, the 3D body provides strong constraints on the solution that can be exploited by a learning-based method. To enable learning, we generate massive training sets of simulated noisy and ground truth mocap markers animated by 3D bodies from AMASS. SOMA exploits an architecture with stacked self-attention elements to learn the spatial structure of the 3D body and an optimal transport layer to constrain the assignment (labeling) problem while rejecting outliers. We extensively evaluate SOMA both quantitatively and qualitatively. SOMA is more accurate and robust than existing state of the art research methods and can be applied where commercial systems cannot. We automatically label over 8 hours of archival mocap data across 4 different datasets captured using various technologies and output SMPL-X body models. The model and data is released for research purposes at https://soma.is.tue.mpg.de/.

查看原文本刊更多论文

SOMA:自动解决基于光学标记的动作捕捉

基于标记的光学运动捕捉(mocap)是在计算机视觉、医学和图形学中获取精确的3D人体运动的“黄金标准”方法。这些系统的原始输出是有噪声的和不完整的三维点或点的短轨迹。为了有用，必须将这些点与拍摄对象上的相应标记联系起来;即。“标签”。给定这些标签，就可以“解决”3D骨架或身体表面网格。商业自动标记工具在捕获时需要特定的校准程序，这对于存档数据是不可能的。在这里，我们训练了一个名为SOMA的新型神经网络，它采用具有不同数量点的原始动作捕捉点云，在没有任何校准数据的情况下按规模标记它们，独立于捕获技术，只需要最小的人为干预。我们的关键见解是，虽然标记点云是高度模糊的，但3D体对解决方案提供了强有力的约束，可以通过基于学习的方法加以利用。为了实现学习，我们生成了大量的训练集，这些训练集由AMASS的3D物体动画模拟的噪声和地面真实动作捕捉标记。SOMA利用具有堆叠自关注元素的架构来学习3D体的空间结构，并利用最优传输层来约束分配(标记)问题，同时拒绝异常值。我们对SOMA进行了广泛的定量和定性评估。SOMA比现有的最先进的研究方法更准确和健壮，可以应用于商业系统无法应用的地方。我们自动标记超过8小时的档案动作捕捉数据跨越4个不同的数据集使用各种技术捕获和输出SMPL-X身体模型。该模型和数据在https://soma.is.tue.mpg.de/上发布用于研究目的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

自引率

0.00%

发文量