Jing Wang, Guohan Liu, Wenxin Ding, Yuying Li, Wanying Song
{"title":"From visual understanding to 6D pose reconstruction: A cutting-edge review of deep learning-based object pose estimation","authors":"Jing Wang, Guohan Liu, Wenxin Ding, Yuying Li, Wanying Song","doi":"10.1016/j.displa.2025.103069","DOIUrl":null,"url":null,"abstract":"<div><div>Object pose estimation, as a key problem in computer vision, plays an important role in tasks such as autonomous driving and robot navigation. However, most of the existing reviews discuss both traditional and deep learning methods and fail to comprehensively define instance-level and category-level object pose estimation methods. To help researchers better understand this field, this paper summarizes instance-level, category-level, and unseen object and articulated body pose estimation methods in detail, filling the gap in the discussion of these emerging areas in existing reviews. Depending on the different modalities of the input data, the implementations, application domains, training paradigms, network architectures, and their strengths and weaknesses of the deep learning-based object position estimation methods are highlighted, and the performance of these methods on different datasets is compared. In addition, this paper comprehensively combs through the evaluation metrics and benchmark datasets in this field, deeply analyzes their application scope and applicability in different scenarios, and reveals the key roles of these metrics and datasets in promoting technological progress and solving practical problems. Facing the current technical bottlenecks, this paper also looks forward to the future development direction from the cutting-edge explorations of multi-view fusion, cross-modal data integration and novel neural networks, which provide brand new ideas and references to push forward the breakthrough progress in the field of object attitude estimation.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"89 ","pages":"Article 103069"},"PeriodicalIF":3.7000,"publicationDate":"2025-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225001064","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Object pose estimation, as a key problem in computer vision, plays an important role in tasks such as autonomous driving and robot navigation. However, most of the existing reviews discuss both traditional and deep learning methods and fail to comprehensively define instance-level and category-level object pose estimation methods. To help researchers better understand this field, this paper summarizes instance-level, category-level, and unseen object and articulated body pose estimation methods in detail, filling the gap in the discussion of these emerging areas in existing reviews. Depending on the different modalities of the input data, the implementations, application domains, training paradigms, network architectures, and their strengths and weaknesses of the deep learning-based object position estimation methods are highlighted, and the performance of these methods on different datasets is compared. In addition, this paper comprehensively combs through the evaluation metrics and benchmark datasets in this field, deeply analyzes their application scope and applicability in different scenarios, and reveals the key roles of these metrics and datasets in promoting technological progress and solving practical problems. Facing the current technical bottlenecks, this paper also looks forward to the future development direction from the cutting-edge explorations of multi-view fusion, cross-modal data integration and novel neural networks, which provide brand new ideas and references to push forward the breakthrough progress in the field of object attitude estimation.
期刊介绍:
Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface.
Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.