Deep Non-rigid Structure-from-Motion: A Sequence-to-Sequence Translation Perspective.

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-08-16 DOI:10.1109/TPAMI.2024.3443922

Hui Deng, Tong Zhang, Yuchao Dai, Jiawei Shi, Yiran Zhong, Hongdong Li

{"title":"Deep Non-rigid Structure-from-Motion: A Sequence-to-Sequence Translation Perspective.","authors":"Hui Deng, Tong Zhang, Yuchao Dai, Jiawei Shi, Yiran Zhong, Hongdong Li","doi":"10.1109/TPAMI.2024.3443922","DOIUrl":null,"url":null,"abstract":"<p><p>Directly regressing the non-rigid shape and camera pose from the individual 2D frame is ill-suited to the Non-Rigid Structure-from-Motion (NRSfM) problem. This frame-by-frame 3D reconstruction pipeline overlooks the inherent spatial-temporal nature of NRSfM, i.e., reconstructing the 3D sequence from the input 2D sequence. In this paper, we propose to solve deep sparse NRSfM from a sequence-to-sequence translation perspective, where the input 2D keypoints sequence is taken as a whole to reconstruct the corresponding 3D keypoints sequence in a self-supervised manner. First, we apply a shape-motion predictor on the input sequence to obtain an initial sequence of shapes and corresponding motions. Then, we propose the Context Layer, which enables the deep learning framework to effectively impose overall constraints on sequences based on the structural characteristics of non-rigid sequences. The Context Layer constructs modules for imposing the self-expressiveness regularity on non-rigid sequences with multi-head attention (MHA) as the core, together with the use of temporal encoding, both of which act simultaneously to constitute constraints on non-rigid sequences in the deep framework. Experimental results across different datasets such as Human3.6M, CMU Mocap, and InterHand prove the superiority of our framework. The code will be made publicly available.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TPAMI.2024.3443922","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Directly regressing the non-rigid shape and camera pose from the individual 2D frame is ill-suited to the Non-Rigid Structure-from-Motion (NRSfM) problem. This frame-by-frame 3D reconstruction pipeline overlooks the inherent spatial-temporal nature of NRSfM, i.e., reconstructing the 3D sequence from the input 2D sequence. In this paper, we propose to solve deep sparse NRSfM from a sequence-to-sequence translation perspective, where the input 2D keypoints sequence is taken as a whole to reconstruct the corresponding 3D keypoints sequence in a self-supervised manner. First, we apply a shape-motion predictor on the input sequence to obtain an initial sequence of shapes and corresponding motions. Then, we propose the Context Layer, which enables the deep learning framework to effectively impose overall constraints on sequences based on the structural characteristics of non-rigid sequences. The Context Layer constructs modules for imposing the self-expressiveness regularity on non-rigid sequences with multi-head attention (MHA) as the core, together with the use of temporal encoding, both of which act simultaneously to constitute constraints on non-rigid sequences in the deep framework. Experimental results across different datasets such as Human3.6M, CMU Mocap, and InterHand prove the superiority of our framework. The code will be made publicly available.

查看原文本刊更多论文

从运动看深度非刚性结构：序列到序列的翻译视角。

从单个二维帧直接回归非刚性形状和摄像机姿态的方法不适合非刚性运动结构（NRSfM）问题。这种逐帧三维重建管道忽略了 NRSfM 固有的时空特性，即从输入的二维序列重建三维序列。在本文中，我们提出从序列到序列转换的角度来解决深度稀疏 NRSfM 问题，即把输入的二维关键点序列作为一个整体，以自我监督的方式重建相应的三维关键点序列。首先，我们在输入序列上应用形状-运动预测器，以获得初始形状序列和相应的运动。然后，我们提出了 "上下文层"（Context Layer），它使深度学习框架能够根据非刚性序列的结构特征，有效地对序列施加整体约束。上下文层以多头注意力（MHA）为核心，结合时间编码的使用，构建了对非刚性序列施加自表达正则性的模块，两者同时作用，构成了深度框架中对非刚性序列的约束。在Human3.6M、CMU Mocap和InterHand等不同数据集上的实验结果证明了我们框架的优越性。代码将公开发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量

文献相关原料

公司名称	产品信息	采购帮参考价格