Marco Job, David Botta, Victor Reijgwart, Luca Ebner, Andrej Studer, Roland Siegwart, Eleni Kelasidi
{"title":"利用学习到的单目深度预测在无人水下航行器上进行姿态估计和映射。","authors":"Marco Job, David Botta, Victor Reijgwart, Luca Ebner, Andrej Studer, Roland Siegwart, Eleni Kelasidi","doi":"10.3389/frobt.2025.1609765","DOIUrl":null,"url":null,"abstract":"<p><p>This paper presents a general framework that integrates visual and acoustic sensor data to enhance localization and mapping in complex, highly dynamic underwater environments, with a particular focus on fish farming. The pipeline enables net-relative pose estimation for Unmanned Underwater Vehicles (UUVs) and depth prediction within net pens solely from visual data by combining deep learning-based monocular depth prediction with sparse depth priors derived from a classical Fast Fourier Transform (FFT)-based method. We further introduce a method to estimate a UUV's global pose by fusing these net-relative estimates with acoustic measurements, and demonstrate how the predicted depth images can be integrated into the wavemap mapping framework to generate detailed 3D maps in real-time. Extensive evaluations on datasets collected in industrial-scale fish farms confirm that the presented framework can be used to accurately estimate a UUV's net-relative and global position in real-time, and provide 3D maps suitable for autonomous navigation and inspection.</p>","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"12 ","pages":"1609765"},"PeriodicalIF":3.0000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12240768/pdf/","citationCount":"0","resultStr":"{\"title\":\"Leveraging learned monocular depth prediction for pose estimation and mapping on unmanned underwater vehicles.\",\"authors\":\"Marco Job, David Botta, Victor Reijgwart, Luca Ebner, Andrej Studer, Roland Siegwart, Eleni Kelasidi\",\"doi\":\"10.3389/frobt.2025.1609765\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This paper presents a general framework that integrates visual and acoustic sensor data to enhance localization and mapping in complex, highly dynamic underwater environments, with a particular focus on fish farming. The pipeline enables net-relative pose estimation for Unmanned Underwater Vehicles (UUVs) and depth prediction within net pens solely from visual data by combining deep learning-based monocular depth prediction with sparse depth priors derived from a classical Fast Fourier Transform (FFT)-based method. We further introduce a method to estimate a UUV's global pose by fusing these net-relative estimates with acoustic measurements, and demonstrate how the predicted depth images can be integrated into the wavemap mapping framework to generate detailed 3D maps in real-time. Extensive evaluations on datasets collected in industrial-scale fish farms confirm that the presented framework can be used to accurately estimate a UUV's net-relative and global position in real-time, and provide 3D maps suitable for autonomous navigation and inspection.</p>\",\"PeriodicalId\":47597,\"journal\":{\"name\":\"Frontiers in Robotics and AI\",\"volume\":\"12 \",\"pages\":\"1609765\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12240768/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Robotics and AI\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/frobt.2025.1609765\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Robotics and AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frobt.2025.1609765","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
Leveraging learned monocular depth prediction for pose estimation and mapping on unmanned underwater vehicles.
This paper presents a general framework that integrates visual and acoustic sensor data to enhance localization and mapping in complex, highly dynamic underwater environments, with a particular focus on fish farming. The pipeline enables net-relative pose estimation for Unmanned Underwater Vehicles (UUVs) and depth prediction within net pens solely from visual data by combining deep learning-based monocular depth prediction with sparse depth priors derived from a classical Fast Fourier Transform (FFT)-based method. We further introduce a method to estimate a UUV's global pose by fusing these net-relative estimates with acoustic measurements, and demonstrate how the predicted depth images can be integrated into the wavemap mapping framework to generate detailed 3D maps in real-time. Extensive evaluations on datasets collected in industrial-scale fish farms confirm that the presented framework can be used to accurately estimate a UUV's net-relative and global position in real-time, and provide 3D maps suitable for autonomous navigation and inspection.
期刊介绍:
Frontiers in Robotics and AI publishes rigorously peer-reviewed research covering all theory and applications of robotics, technology, and artificial intelligence, from biomedical to space robotics.