利用多模式物体-实例再识别技术实现全球本地化

Aneesh Chavan, Vaibhav Agrawal, Vineeth Bhat, Sarthak Chittawar, Siddharth Srivastava, Chetan Arora, K Madhava Krishna
{"title":"利用多模式物体-实例再识别技术实现全球本地化","authors":"Aneesh Chavan, Vaibhav Agrawal, Vineeth Bhat, Sarthak Chittawar, Siddharth Srivastava, Chetan Arora, K Madhava Krishna","doi":"arxiv-2409.12002","DOIUrl":null,"url":null,"abstract":"Re-identification (ReID) is a critical challenge in computer vision,\npredominantly studied in the context of pedestrians and vehicles. However,\nrobust object-instance ReID, which has significant implications for tasks such\nas autonomous exploration, long-term perception, and scene understanding,\nremains underexplored. In this work, we address this gap by proposing a novel\ndual-path object-instance re-identification transformer architecture that\nintegrates multimodal RGB and depth information. By leveraging depth data, we\ndemonstrate improvements in ReID across scenes that are cluttered or have\nvarying illumination conditions. Additionally, we develop a ReID-based\nlocalization framework that enables accurate camera localization and pose\nidentification across different viewpoints. We validate our methods using two\ncustom-built RGB-D datasets, as well as multiple sequences from the open-source\nTUM RGB-D datasets. Our approach demonstrates significant improvements in both\nobject instance ReID (mAP of 75.18) and localization accuracy (success rate of\n83% on TUM-RGBD), highlighting the essential role of object ReID in advancing\nrobotic perception. Our models, frameworks, and datasets have been made\npublicly available.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Global Localization using Multi-Modal Object-Instance Re-Identification\",\"authors\":\"Aneesh Chavan, Vaibhav Agrawal, Vineeth Bhat, Sarthak Chittawar, Siddharth Srivastava, Chetan Arora, K Madhava Krishna\",\"doi\":\"arxiv-2409.12002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Re-identification (ReID) is a critical challenge in computer vision,\\npredominantly studied in the context of pedestrians and vehicles. However,\\nrobust object-instance ReID, which has significant implications for tasks such\\nas autonomous exploration, long-term perception, and scene understanding,\\nremains underexplored. In this work, we address this gap by proposing a novel\\ndual-path object-instance re-identification transformer architecture that\\nintegrates multimodal RGB and depth information. By leveraging depth data, we\\ndemonstrate improvements in ReID across scenes that are cluttered or have\\nvarying illumination conditions. Additionally, we develop a ReID-based\\nlocalization framework that enables accurate camera localization and pose\\nidentification across different viewpoints. We validate our methods using two\\ncustom-built RGB-D datasets, as well as multiple sequences from the open-source\\nTUM RGB-D datasets. Our approach demonstrates significant improvements in both\\nobject instance ReID (mAP of 75.18) and localization accuracy (success rate of\\n83% on TUM-RGBD), highlighting the essential role of object ReID in advancing\\nrobotic perception. Our models, frameworks, and datasets have been made\\npublicly available.\",\"PeriodicalId\":501031,\"journal\":{\"name\":\"arXiv - CS - Robotics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Robotics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.12002\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

重新识别(ReID)是计算机视觉领域的一项重要挑战,主要针对行人和车辆进行研究。然而,对于自主探索、长期感知和场景理解等任务具有重要意义的鲁棒对象-实例再识别(robust object-instance ReID)仍未得到充分探索。在这项工作中,我们提出了一种新颖的双路径物体-实体再识别转换器架构,整合了多模态 RGB 和深度信息,从而弥补了这一空白。通过利用深度数据,我们展示了在杂乱或光照条件变化的场景中 ReID 的改进。此外,我们还开发了一个基于 ReID 的定位框架,能够在不同视角下进行精确的相机定位和姿势识别。我们使用两个定制的 RGB-D 数据集以及来自开源 TUM RGB-D 数据集的多个序列验证了我们的方法。我们的方法在物体实例再识别(mAP 为 75.18)和定位精度(TUM-RGBD 上的成功率为 83%)方面都有显著提高,突出了物体再识别在促进机器人感知方面的重要作用。我们的模型、框架和数据集均已公开发布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Towards Global Localization using Multi-Modal Object-Instance Re-Identification
Re-identification (ReID) is a critical challenge in computer vision, predominantly studied in the context of pedestrians and vehicles. However, robust object-instance ReID, which has significant implications for tasks such as autonomous exploration, long-term perception, and scene understanding, remains underexplored. In this work, we address this gap by proposing a novel dual-path object-instance re-identification transformer architecture that integrates multimodal RGB and depth information. By leveraging depth data, we demonstrate improvements in ReID across scenes that are cluttered or have varying illumination conditions. Additionally, we develop a ReID-based localization framework that enables accurate camera localization and pose identification across different viewpoints. We validate our methods using two custom-built RGB-D datasets, as well as multiple sequences from the open-source TUM RGB-D datasets. Our approach demonstrates significant improvements in both object instance ReID (mAP of 75.18) and localization accuracy (success rate of 83% on TUM-RGBD), highlighting the essential role of object ReID in advancing robotic perception. Our models, frameworks, and datasets have been made publicly available.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信