Monocular Object Detection & Localization on a 2D Plane Adapted to 360° Images without Retraining

2023 8th International Conference on Control and Robotics Engineering (ICCRE) Pub Date : 2023-04-21 DOI:10.1109/ICCRE57112.2023.10155610

A. Farid, O. Yoshie

{"title":"Monocular Object Detection & Localization on a 2D Plane Adapted to 360° Images without Retraining","authors":"A. Farid, O. Yoshie","doi":"10.1109/ICCRE57112.2023.10155610","DOIUrl":null,"url":null,"abstract":"Equirectangular 360° images have the property of encompassing the omnidirectional field-of-vision in a single one-shot image, which have benefits and interesting use-cases as a form of perception for robots and autonomous vehicles. It is thus reasonable to implement object detection and localization on such images to enrich the perception of the surroundings of a given robot. Even though object detection models that were trained by deep learning have seen massive developments over the years, they do not adequately address the spherical semantics of an equirectangular image without special modification; a single image represents an observation (a color sphere with an assumed constant radius) in the form of a 2D image that does not semantically connect the side edges that are in fact the same in the real physical world. As a result, objects that lie on those vertical edges are not correctly detected. In this paper, we address this main problem by describing a methodology that adapts to any pre-trained object detection model without any retraining necessary. This is achieved by first applying the calibration parameters of the utilized camera to obtain a spherically corrected equirectangular image, then inferencing bounding box locations based on a batch of one image and its horizontally shifted version. Afterwards, we select the correct bounding boxes based on positional criteria. Additionally, we utilize calibration to correctly map between image pixel positions and real-world spherical coordinates. This allows us to utilize the spherical coordinates to create an image-to-world homography (assumption a flat-surface topology), thus achieving object localization.","PeriodicalId":285164,"journal":{"name":"2023 8th International Conference on Control and Robotics Engineering (ICCRE)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 8th International Conference on Control and Robotics Engineering (ICCRE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCRE57112.2023.10155610","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Equirectangular 360° images have the property of encompassing the omnidirectional field-of-vision in a single one-shot image, which have benefits and interesting use-cases as a form of perception for robots and autonomous vehicles. It is thus reasonable to implement object detection and localization on such images to enrich the perception of the surroundings of a given robot. Even though object detection models that were trained by deep learning have seen massive developments over the years, they do not adequately address the spherical semantics of an equirectangular image without special modification; a single image represents an observation (a color sphere with an assumed constant radius) in the form of a 2D image that does not semantically connect the side edges that are in fact the same in the real physical world. As a result, objects that lie on those vertical edges are not correctly detected. In this paper, we address this main problem by describing a methodology that adapts to any pre-trained object detection model without any retraining necessary. This is achieved by first applying the calibration parameters of the utilized camera to obtain a spherically corrected equirectangular image, then inferencing bounding box locations based on a batch of one image and its horizontally shifted version. Afterwards, we select the correct bounding boxes based on positional criteria. Additionally, we utilize calibration to correctly map between image pixel positions and real-world spherical coordinates. This allows us to utilize the spherical coordinates to create an image-to-world homography (assumption a flat-surface topology), thus achieving object localization.

查看原文本刊更多论文

一种无需再训练的、适用于360°图像的二维平面单眼目标检测与定位方法

等矩形360°图像具有将全向视野包含在单个单镜头图像中的特性，作为机器人和自动驾驶汽车的感知形式，这具有优势和有趣的用例。因此，在这些图像上实现目标检测和定位是合理的，以丰富给定机器人对周围环境的感知。尽管经过深度学习训练的目标检测模型多年来取得了巨大的发展，但如果没有特殊修改，它们不能充分解决等矩形图像的球面语义;单个图像以2D图像的形式表示观察结果(假设半径恒定的彩色球体)，该图像在语义上不连接实际物理世界中相同的侧边。因此，位于这些垂直边缘上的物体无法被正确检测到。在本文中，我们通过描述一种方法来解决这个主要问题，该方法可以适应任何预训练的目标检测模型，而无需任何再训练。这是通过首先应用所利用相机的校准参数来获得球校正后的等矩形图像，然后根据一批图像及其水平位移版本推断边界框位置来实现的。然后，我们根据位置标准选择正确的边界框。此外，我们利用校准来正确映射图像像素位置和现实世界的球坐标。这允许我们利用球坐标来创建图像到世界的单应性(假设是平面拓扑)，从而实现对象定位。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 8th International Conference on Control and Robotics Engineering (ICCRE)

自引率

0.00%

发文量