Towards Global Localization using Multi-Modal Object-Instance Re-Identification

arXiv - CS - Robotics Pub Date : 2024-09-18 DOI:arxiv-2409.12002

Aneesh Chavan, Vaibhav Agrawal, Vineeth Bhat, Sarthak Chittawar, Siddharth Srivastava, Chetan Arora, K Madhava Krishna

{"title":"Towards Global Localization using Multi-Modal Object-Instance Re-Identification","authors":"Aneesh Chavan, Vaibhav Agrawal, Vineeth Bhat, Sarthak Chittawar, Siddharth Srivastava, Chetan Arora, K Madhava Krishna","doi":"arxiv-2409.12002","DOIUrl":null,"url":null,"abstract":"Re-identification (ReID) is a critical challenge in computer vision,\npredominantly studied in the context of pedestrians and vehicles. However,\nrobust object-instance ReID, which has significant implications for tasks such\nas autonomous exploration, long-term perception, and scene understanding,\nremains underexplored. In this work, we address this gap by proposing a novel\ndual-path object-instance re-identification transformer architecture that\nintegrates multimodal RGB and depth information. By leveraging depth data, we\ndemonstrate improvements in ReID across scenes that are cluttered or have\nvarying illumination conditions. Additionally, we develop a ReID-based\nlocalization framework that enables accurate camera localization and pose\nidentification across different viewpoints. We validate our methods using two\ncustom-built RGB-D datasets, as well as multiple sequences from the open-source\nTUM RGB-D datasets. Our approach demonstrates significant improvements in both\nobject instance ReID (mAP of 75.18) and localization accuracy (success rate of\n83% on TUM-RGBD), highlighting the essential role of object ReID in advancing\nrobotic perception. Our models, frameworks, and datasets have been made\npublicly available.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"49 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Re-identification (ReID) is a critical challenge in computer vision, predominantly studied in the context of pedestrians and vehicles. However, robust object-instance ReID, which has significant implications for tasks such as autonomous exploration, long-term perception, and scene understanding, remains underexplored. In this work, we address this gap by proposing a novel dual-path object-instance re-identification transformer architecture that integrates multimodal RGB and depth information. By leveraging depth data, we demonstrate improvements in ReID across scenes that are cluttered or have varying illumination conditions. Additionally, we develop a ReID-based localization framework that enables accurate camera localization and pose identification across different viewpoints. We validate our methods using two custom-built RGB-D datasets, as well as multiple sequences from the open-source TUM RGB-D datasets. Our approach demonstrates significant improvements in both object instance ReID (mAP of 75.18) and localization accuracy (success rate of 83% on TUM-RGBD), highlighting the essential role of object ReID in advancing robotic perception. Our models, frameworks, and datasets have been made publicly available.

查看原文本刊更多论文

利用多模式物体-实例再识别技术实现全球本地化

重新识别（ReID）是计算机视觉领域的一项重要挑战，主要针对行人和车辆进行研究。然而，对于自主探索、长期感知和场景理解等任务具有重要意义的鲁棒对象-实例再识别（robust object-instance ReID）仍未得到充分探索。在这项工作中，我们提出了一种新颖的双路径物体-实体再识别转换器架构，整合了多模态 RGB 和深度信息，从而弥补了这一空白。通过利用深度数据，我们展示了在杂乱或光照条件变化的场景中 ReID 的改进。此外，我们还开发了一个基于 ReID 的定位框架，能够在不同视角下进行精确的相机定位和姿势识别。我们使用两个定制的 RGB-D 数据集以及来自开源 TUM RGB-D 数据集的多个序列验证了我们的方法。我们的方法在物体实例再识别（mAP 为 75.18）和定位精度（TUM-RGBD 上的成功率为 83%）方面都有显著提高，突出了物体再识别在促进机器人感知方面的重要作用。我们的模型、框架和数据集均已公开发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Robotics

自引率

0.00%

发文量