{"title":"Geometric Consistency-Guaranteed Spatio-Temporal Transformer for Unsupervised Multiview 3-D Pose Estimation","authors":"Kaiwen Dong;Kévin Riou;Jingwen Zhu;Andréas Pastor;Kévin Subrin;Yu Zhou;Xiao Yun;Yanjing Sun;Patrick Le Callet","doi":"10.1109/TIM.2024.3440376","DOIUrl":null,"url":null,"abstract":"Unsupervised 3-D pose estimation has gained prominence due to the challenges in acquiring labeled 3-D data for training. Despite promising progress, unsupervised approaches still lag behind supervised methods in performance. Two factors impede the progress of unsupervised approaches: incomplete geometric constraint and inadequate interaction among spatial, temporal, and multiview features. This article introduces an unsupervised pipeline that uses calibrated camera parameters as geometric constraints across views and coordinate spaces to optimize the model by minimizing inconsistencies between the 2-D input pose and the reprojection of the predicted 3-D pose. This pipeline utilizes the novel hierarchical cross transformer (HCT) to encode higher levels of information by enabling interactions among hierarchical features containing different levels of temporal, spatial, and cross-view information. By minimizing the reliance on human-specific parts, the HCT shows potential for adapting to various pose estimation tasks. To validate the adaptability, we build a connection between human pose estimation and scene pose estimation, introducing a dynamic-keypoints-3-D (DKs-3D) dataset tailored for 3-D scene pose estimation in robotic manipulation. Experiments on two 3-D human pose estimation datasets demonstrate our method’s new state-of-the-art performance among weakly and unsupervised approaches. The adaptability of our method is confirmed through experiments on DK-3D, setting the initial benchmark for unsupervised 2-D-to-3-D scene pose lifting.","PeriodicalId":13341,"journal":{"name":"IEEE Transactions on Instrumentation and Measurement","volume":null,"pages":null},"PeriodicalIF":5.6000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Instrumentation and Measurement","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10663570/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Unsupervised 3-D pose estimation has gained prominence due to the challenges in acquiring labeled 3-D data for training. Despite promising progress, unsupervised approaches still lag behind supervised methods in performance. Two factors impede the progress of unsupervised approaches: incomplete geometric constraint and inadequate interaction among spatial, temporal, and multiview features. This article introduces an unsupervised pipeline that uses calibrated camera parameters as geometric constraints across views and coordinate spaces to optimize the model by minimizing inconsistencies between the 2-D input pose and the reprojection of the predicted 3-D pose. This pipeline utilizes the novel hierarchical cross transformer (HCT) to encode higher levels of information by enabling interactions among hierarchical features containing different levels of temporal, spatial, and cross-view information. By minimizing the reliance on human-specific parts, the HCT shows potential for adapting to various pose estimation tasks. To validate the adaptability, we build a connection between human pose estimation and scene pose estimation, introducing a dynamic-keypoints-3-D (DKs-3D) dataset tailored for 3-D scene pose estimation in robotic manipulation. Experiments on two 3-D human pose estimation datasets demonstrate our method’s new state-of-the-art performance among weakly and unsupervised approaches. The adaptability of our method is confirmed through experiments on DK-3D, setting the initial benchmark for unsupervised 2-D-to-3-D scene pose lifting.
期刊介绍:
Papers are sought that address innovative solutions to the development and use of electrical and electronic instruments and equipment to measure, monitor and/or record physical phenomena for the purpose of advancing measurement science, methods, functionality and applications. The scope of these papers may encompass: (1) theory, methodology, and practice of measurement; (2) design, development and evaluation of instrumentation and measurement systems and components used in generating, acquiring, conditioning and processing signals; (3) analysis, representation, display, and preservation of the information obtained from a set of measurements; and (4) scientific and technical support to establishment and maintenance of technical standards in the field of Instrumentation and Measurement.