{"title":"水下机器人无三维监督下基于立体的人体姿态估计","authors":"Ying-Kun Wu;Junaed Sattar","doi":"10.1109/LRA.2025.3557235","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a novel deep learning-based 3D underwater human pose estimator capable of providing metric 3D poses of scuba divers from stereo image pairs. While existing research has made significant advancements in 3D human pose estimation, most methods rely on 3D ground truth for training, which is challenging to acquire in dynamic environments where traditional motion capture systems are impractical to deploy. To overcome this, our approach leverages epipolar geometry to derive 3D information from 2D estimations. Our method estimates semantic keypoints while capturing their corresponding disparity from binocular perspectives, thus avoiding challenges in calibrating for multi-view setups or scale-ambiguity in monocular settings. Additionally, to reduce the sensitivity of our method to 2D annotation accuracy, we propose an auto-refinement pipeline to automatically correct biases introduced by human labeling. Experiments demonstrate that our approach significantly improves performance compared to previous state-of-the-art methods in different environments, including but not limited to underwater scenarios, while being efficient enough to run on limited-capacity edge devices.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 5","pages":"5002-5009"},"PeriodicalIF":4.6000,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Stereo-Based 3D Human Pose Estimation for Underwater Robots Without 3D Supervision\",\"authors\":\"Ying-Kun Wu;Junaed Sattar\",\"doi\":\"10.1109/LRA.2025.3557235\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a novel deep learning-based 3D underwater human pose estimator capable of providing metric 3D poses of scuba divers from stereo image pairs. While existing research has made significant advancements in 3D human pose estimation, most methods rely on 3D ground truth for training, which is challenging to acquire in dynamic environments where traditional motion capture systems are impractical to deploy. To overcome this, our approach leverages epipolar geometry to derive 3D information from 2D estimations. Our method estimates semantic keypoints while capturing their corresponding disparity from binocular perspectives, thus avoiding challenges in calibrating for multi-view setups or scale-ambiguity in monocular settings. Additionally, to reduce the sensitivity of our method to 2D annotation accuracy, we propose an auto-refinement pipeline to automatically correct biases introduced by human labeling. Experiments demonstrate that our approach significantly improves performance compared to previous state-of-the-art methods in different environments, including but not limited to underwater scenarios, while being efficient enough to run on limited-capacity edge devices.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"10 5\",\"pages\":\"5002-5009\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10947328/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10947328/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
Stereo-Based 3D Human Pose Estimation for Underwater Robots Without 3D Supervision
In this paper, we propose a novel deep learning-based 3D underwater human pose estimator capable of providing metric 3D poses of scuba divers from stereo image pairs. While existing research has made significant advancements in 3D human pose estimation, most methods rely on 3D ground truth for training, which is challenging to acquire in dynamic environments where traditional motion capture systems are impractical to deploy. To overcome this, our approach leverages epipolar geometry to derive 3D information from 2D estimations. Our method estimates semantic keypoints while capturing their corresponding disparity from binocular perspectives, thus avoiding challenges in calibrating for multi-view setups or scale-ambiguity in monocular settings. Additionally, to reduce the sensitivity of our method to 2D annotation accuracy, we propose an auto-refinement pipeline to automatically correct biases introduced by human labeling. Experiments demonstrate that our approach significantly improves performance compared to previous state-of-the-art methods in different environments, including but not limited to underwater scenarios, while being efficient enough to run on limited-capacity edge devices.
期刊介绍:
The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.