地点识别满足多种模式：综合回顾、当前挑战和未来发展

IF 13.9 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Review Pub Date : 2025-08-30 DOI:10.1007/s10462-025-11367-8

Zhenyu Li, Tianyi Shang, Pengjie Xu, Zhaojun Deng

{"title":"地点识别满足多种模式：综合回顾、当前挑战和未来发展","authors":"Zhenyu Li, Tianyi Shang, Pengjie Xu, Zhaojun Deng","doi":"10.1007/s10462-025-11367-8","DOIUrl":null,"url":null,"abstract":"<div><p>Place recognition is a cornerstone of vehicle navigation and mapping, which is pivotal in enabling systems to determine whether a location has been previously visited. This capability is critical for tasks such as loop closure in Simultaneous Localization and Mapping (SLAM) and long-term navigation under varying environmental conditions. This survey comprehensively reviews recent advancements in place recognition, emphasizing three representative methodological paradigms: Convolutional Neural Network (CNN)-based approaches, Transformer-based frameworks, and cross-modal strategies. We begin by elucidating the significance of place recognition within the broader context of autonomous systems. Subsequently, we trace the evolution of CNN-based methods, highlighting their contributions to robust visual descriptor learning and scalability in large-scale environments. We then examine the emerging class of Transformer-based models, which leverage self-attention mechanisms to capture global dependencies and offer improved generalization across diverse scenes. Furthermore, we discuss cross-modal approaches that integrate heterogeneous data sources such as Lidar, vision, and text description, thereby enhancing resilience to viewpoint, illumination, and seasonal variations. We also summarize standard datasets and evaluation metrics widely adopted in the literature. To the best of our knowledge, no prior survey has systematically reviewed visual, LiDAR, and cross-modal place recognition concurrently. This work thus resolves a critical gap in existing literature dominated by single-modality studies. Finally, we identify current research challenges and outline prospective directions, including domain adaptation, real-time performance, and lifelong learning, to inspire future advancements in this domain. The unified framework of leading-edge place recognition methods, i.e., code library, and the results of their experimental evaluations are available at https://github.com/CV4RA/SOTA-Place-Recognitioner.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"58 11","pages":""},"PeriodicalIF":13.9000,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-025-11367-8.pdf","citationCount":"0","resultStr":"{\"title\":\"Place recognition meet multiple modalities: a comprehensive review, current challenges and future development\",\"authors\":\"Zhenyu Li, Tianyi Shang, Pengjie Xu, Zhaojun Deng\",\"doi\":\"10.1007/s10462-025-11367-8\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Place recognition is a cornerstone of vehicle navigation and mapping, which is pivotal in enabling systems to determine whether a location has been previously visited. This capability is critical for tasks such as loop closure in Simultaneous Localization and Mapping (SLAM) and long-term navigation under varying environmental conditions. This survey comprehensively reviews recent advancements in place recognition, emphasizing three representative methodological paradigms: Convolutional Neural Network (CNN)-based approaches, Transformer-based frameworks, and cross-modal strategies. We begin by elucidating the significance of place recognition within the broader context of autonomous systems. Subsequently, we trace the evolution of CNN-based methods, highlighting their contributions to robust visual descriptor learning and scalability in large-scale environments. We then examine the emerging class of Transformer-based models, which leverage self-attention mechanisms to capture global dependencies and offer improved generalization across diverse scenes. Furthermore, we discuss cross-modal approaches that integrate heterogeneous data sources such as Lidar, vision, and text description, thereby enhancing resilience to viewpoint, illumination, and seasonal variations. We also summarize standard datasets and evaluation metrics widely adopted in the literature. To the best of our knowledge, no prior survey has systematically reviewed visual, LiDAR, and cross-modal place recognition concurrently. This work thus resolves a critical gap in existing literature dominated by single-modality studies. Finally, we identify current research challenges and outline prospective directions, including domain adaptation, real-time performance, and lifelong learning, to inspire future advancements in this domain. The unified framework of leading-edge place recognition methods, i.e., code library, and the results of their experimental evaluations are available at https://github.com/CV4RA/SOTA-Place-Recognitioner.</p></div>\",\"PeriodicalId\":8449,\"journal\":{\"name\":\"Artificial Intelligence Review\",\"volume\":\"58 11\",\"pages\":\"\"},\"PeriodicalIF\":13.9000,\"publicationDate\":\"2025-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s10462-025-11367-8.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence Review\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10462-025-11367-8\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence Review","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10462-025-11367-8","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

地点识别是车辆导航和地图绘制的基础，对于系统确定某个地点是否曾经被访问过至关重要。这种能力对于同时定位和绘图（SLAM）中的闭环以及在不同环境条件下的长期导航等任务至关重要。本调查全面回顾了位置识别的最新进展，强调了三种代表性的方法范式：基于卷积神经网络（CNN）的方法，基于transformer的框架和跨模态策略。我们首先阐明位置识别在自治系统的更广泛的背景下的意义。随后，我们追溯了基于cnn的方法的演变，强调了它们对大规模环境中鲁棒视觉描述符学习和可扩展性的贡献。然后，我们研究了新兴的基于transformer的模型类，它利用自关注机制来捕获全局依赖关系，并在不同的场景中提供改进的泛化。此外，我们还讨论了整合异构数据源（如激光雷达、视觉和文本描述）的跨模式方法，从而增强了对视点、光照和季节变化的适应能力。我们还总结了文献中广泛采用的标准数据集和评估指标。据我们所知，没有任何先前的调查系统地同时审查了视觉，激光雷达和跨模态位置识别。因此，这项工作解决了现有文献中以单模态研究为主的一个关键空白。最后，我们确定了当前的研究挑战并概述了未来的研究方向，包括领域适应、实时性能和终身学习，以激励该领域的未来发展。前沿位置识别方法的统一框架，即代码库，以及它们的实验评估结果可在https://github.com/CV4RA/SOTA-Place-Recognitioner上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Place recognition meet multiple modalities: a comprehensive review, current challenges and future development

Place recognition is a cornerstone of vehicle navigation and mapping, which is pivotal in enabling systems to determine whether a location has been previously visited. This capability is critical for tasks such as loop closure in Simultaneous Localization and Mapping (SLAM) and long-term navigation under varying environmental conditions. This survey comprehensively reviews recent advancements in place recognition, emphasizing three representative methodological paradigms: Convolutional Neural Network (CNN)-based approaches, Transformer-based frameworks, and cross-modal strategies. We begin by elucidating the significance of place recognition within the broader context of autonomous systems. Subsequently, we trace the evolution of CNN-based methods, highlighting their contributions to robust visual descriptor learning and scalability in large-scale environments. We then examine the emerging class of Transformer-based models, which leverage self-attention mechanisms to capture global dependencies and offer improved generalization across diverse scenes. Furthermore, we discuss cross-modal approaches that integrate heterogeneous data sources such as Lidar, vision, and text description, thereby enhancing resilience to viewpoint, illumination, and seasonal variations. We also summarize standard datasets and evaluation metrics widely adopted in the literature. To the best of our knowledge, no prior survey has systematically reviewed visual, LiDAR, and cross-modal place recognition concurrently. This work thus resolves a critical gap in existing literature dominated by single-modality studies. Finally, we identify current research challenges and outline prospective directions, including domain adaptation, real-time performance, and lifelong learning, to inspire future advancements in this domain. The unified framework of leading-edge place recognition methods, i.e., code library, and the results of their experimental evaluations are available at https://github.com/CV4RA/SOTA-Place-Recognitioner.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Artificial Intelligence Review 工程技术-计算机：人工智能

CiteScore

22.00

自引率

3.30%

发文量

194

审稿时长

5.3 months

期刊介绍： Artificial Intelligence Review, a fully open access journal, publishes cutting-edge research in artificial intelligence and cognitive science. It features critical evaluations of applications, techniques, and algorithms, providing a platform for both researchers and application developers. The journal includes refereed survey and tutorial articles, along with reviews and commentary on significant developments in the field.