From visual understanding to 6D pose reconstruction: A cutting-edge review of deep learning-based object pose estimation

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays Pub Date : 2025-05-14 DOI:10.1016/j.displa.2025.103069

Jing Wang, Guohan Liu, Wenxin Ding, Yuying Li, Wanying Song

{"title":"From visual understanding to 6D pose reconstruction: A cutting-edge review of deep learning-based object pose estimation","authors":"Jing Wang, Guohan Liu, Wenxin Ding, Yuying Li, Wanying Song","doi":"10.1016/j.displa.2025.103069","DOIUrl":null,"url":null,"abstract":"<div><div>Object pose estimation, as a key problem in computer vision, plays an important role in tasks such as autonomous driving and robot navigation. However, most of the existing reviews discuss both traditional and deep learning methods and fail to comprehensively define instance-level and category-level object pose estimation methods. To help researchers better understand this field, this paper summarizes instance-level, category-level, and unseen object and articulated body pose estimation methods in detail, filling the gap in the discussion of these emerging areas in existing reviews. Depending on the different modalities of the input data, the implementations, application domains, training paradigms, network architectures, and their strengths and weaknesses of the deep learning-based object position estimation methods are highlighted, and the performance of these methods on different datasets is compared. In addition, this paper comprehensively combs through the evaluation metrics and benchmark datasets in this field, deeply analyzes their application scope and applicability in different scenarios, and reveals the key roles of these metrics and datasets in promoting technological progress and solving practical problems. Facing the current technical bottlenecks, this paper also looks forward to the future development direction from the cutting-edge explorations of multi-view fusion, cross-modal data integration and novel neural networks, which provide brand new ideas and references to push forward the breakthrough progress in the field of object attitude estimation.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"89 ","pages":"Article 103069"},"PeriodicalIF":3.7000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225001064","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Object pose estimation, as a key problem in computer vision, plays an important role in tasks such as autonomous driving and robot navigation. However, most of the existing reviews discuss both traditional and deep learning methods and fail to comprehensively define instance-level and category-level object pose estimation methods. To help researchers better understand this field, this paper summarizes instance-level, category-level, and unseen object and articulated body pose estimation methods in detail, filling the gap in the discussion of these emerging areas in existing reviews. Depending on the different modalities of the input data, the implementations, application domains, training paradigms, network architectures, and their strengths and weaknesses of the deep learning-based object position estimation methods are highlighted, and the performance of these methods on different datasets is compared. In addition, this paper comprehensively combs through the evaluation metrics and benchmark datasets in this field, deeply analyzes their application scope and applicability in different scenarios, and reveals the key roles of these metrics and datasets in promoting technological progress and solving practical problems. Facing the current technical bottlenecks, this paper also looks forward to the future development direction from the cutting-edge explorations of multi-view fusion, cross-modal data integration and novel neural networks, which provide brand new ideas and references to push forward the breakthrough progress in the field of object attitude estimation.

查看原文本刊更多论文

从视觉理解到6D姿态重建：基于深度学习的物体姿态估计的前沿综述

目标姿态估计是计算机视觉中的一个关键问题，在自动驾驶和机器人导航等任务中发挥着重要作用。然而，现有的大多数综述讨论了传统和深度学习方法，未能全面定义实例级和类别级目标姿态估计方法。为了帮助研究者更好地理解这一领域，本文详细总结了实例级、类别级和未见物体和关节体姿态估计方法，填补了现有文献对这些新兴领域讨论的空白。根据输入数据的不同模式，重点介绍了基于深度学习的目标位置估计方法的实现、应用领域、训练范式、网络架构及其优缺点，并比较了这些方法在不同数据集上的性能。此外，本文还对该领域的评价指标和基准数据集进行了全面梳理，深入分析了其在不同场景下的应用范围和适用性，揭示了这些指标和数据集在推动技术进步和解决实际问题方面的关键作用。面对当前的技术瓶颈，本文也从多视点融合、跨模态数据集成和新型神经网络的前沿探索展望了未来的发展方向，为推动目标姿态估计领域的突破性进展提供了全新的思路和参考。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Displays 工程技术-工程：电子与电气

CiteScore

4.60

自引率

25.60%

发文量

138

审稿时长

92 days

期刊介绍： Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface. Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.