Monocular Pose and Shape Reconstruction of Vehicles in UAV imagery using a Multi-task CNN

IF 3.3 4区地球科学 Q3 IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY

PFG-Journal of Photogrammetry Remote Sensing and Geoinformation Science Pub Date : 2024-09-16 DOI:10.1007/s41064-024-00311-0

S. El Amrani Abouelassad, M. Mehltretter, F. Rottensteiner

{"title":"Monocular Pose and Shape Reconstruction of Vehicles in UAV imagery using a Multi-task CNN","authors":"S. El Amrani Abouelassad, M. Mehltretter, F. Rottensteiner","doi":"10.1007/s41064-024-00311-0","DOIUrl":null,"url":null,"abstract":"Estimating the pose and shape of vehicles from aerial images is an important, yet challenging task. While there are many existing approaches that use stereo images from street-level perspectives to reconstruct objects in 3D, the majority of aerial configurations used for purposes like traffic surveillance are limited to monocular images. Addressing this challenge, a Convolutional Neural Network-based method is presented in this paper, which jointly performs detection, pose, type and 3D shape estimation for vehicles observed in monocular UAV imagery. For this purpose, a robust 3D object model is used following the concept of an Active Shape Model. In addition, different variants of loss functions for learning 3D shape estimation are presented, focusing on the height component, which is particularly challenging to estimate from monocular near-nadir images. We also introduce a UAV-based dataset to evaluate our model in addition to an augmented version of the publicly available Hessigheim benchmark dataset. Our method yields promising results in pose and shape estimation: utilising images with a ground sampling distance (GSD) of 3 cm, it achieves median errors of up to 4 cm in position and 3° in orientation. Additionally, it achieves root mean square (RMS) errors of \\(\\pm 6\\) cm in planimetry and \\(\\pm 18\\) cm in height for keypoints defining the car shape.","PeriodicalId":56035,"journal":{"name":"PFG-Journal of Photogrammetry Remote Sensing and Geoinformation Science","volume":"81 1","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PFG-Journal of Photogrammetry Remote Sensing and Geoinformation Science","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.1007/s41064-024-00311-0","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"IMAGING SCIENCE & PHOTOGRAPHIC TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Estimating the pose and shape of vehicles from aerial images is an important, yet challenging task. While there are many existing approaches that use stereo images from street-level perspectives to reconstruct objects in 3D, the majority of aerial configurations used for purposes like traffic surveillance are limited to monocular images. Addressing this challenge, a Convolutional Neural Network-based method is presented in this paper, which jointly performs detection, pose, type and 3D shape estimation for vehicles observed in monocular UAV imagery. For this purpose, a robust 3D object model is used following the concept of an Active Shape Model. In addition, different variants of loss functions for learning 3D shape estimation are presented, focusing on the height component, which is particularly challenging to estimate from monocular near-nadir images. We also introduce a UAV-based dataset to evaluate our model in addition to an augmented version of the publicly available Hessigheim benchmark dataset. Our method yields promising results in pose and shape estimation: utilising images with a ground sampling distance (GSD) of 3 cm, it achieves median errors of up to 4 cm in position and 3° in orientation. Additionally, it achieves root mean square (RMS) errors of \(\pm 6\) cm in planimetry and \(\pm 18\) cm in height for keypoints defining the car shape.

Abstract Image

查看原文本刊更多论文

利用多任务 CNN 对无人机图像中的车辆进行单目姿态和形状重构

从航空图像中估计车辆的姿态和形状是一项重要而又具有挑战性的任务。虽然现有的许多方法都使用街景立体图像来重建三维物体，但大多数用于交通监控等目的的航空配置都仅限于单目图像。为了应对这一挑战，本文提出了一种基于卷积神经网络的方法，该方法可联合执行单目无人机图像中观察到的车辆的检测、姿态、类型和三维形状估计。为此，根据主动形状模型的概念，使用了一个稳健的三维物体模型。此外，我们还介绍了用于学习三维形状估计的不同损失函数变体，重点关注高度分量，因为从单目近天底图像中估计高度分量尤其具有挑战性。除了公开的 Hessigheim 基准数据集的增强版外，我们还引入了一个基于无人机的数据集来评估我们的模型。我们的方法在姿态和形状估计方面取得了可喜的成果：利用地面采样距离（GSD）为 3 厘米的图像，我们的方法在位置和方向上的中值误差分别达到了 4 厘米和 3°。此外，对于定义汽车形状的关键点，它的平面测量均方根（RMS）误差为 \(\pm 6\) 厘米，高度误差为 \(\pm 18\) 厘米。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PFG-Journal of Photogrammetry Remote Sensing and Geoinformation Science Physics and Astronomy-Instrumentation

CiteScore

8.20

自引率

2.40%

发文量

期刊介绍： PFG is an international scholarly journal covering the progress and application of photogrammetric methods, remote sensing technology and the interconnected field of geoinformation science. It places special editorial emphasis on the communication of new methodologies in data acquisition and new approaches to optimized processing and interpretation of all types of data which were acquired by photogrammetric methods, remote sensing, image processing and the computer-aided interpretation of such data in general. The journal hence addresses both researchers and students of these disciplines at academic institutions and universities as well as the downstream users in both the private sector and public administration. Founded in 1926 under the former name Bildmessung und Luftbildwesen, PFG is worldwide the oldest journal on photogrammetry. It is the official journal of the German Society for Photogrammetry, Remote Sensing and Geoinformation (DGPF).