ClusVPR: Efficient Visual Place Recognition With Clustering-Based Weighted Transformer

IEEE transactions on artificial intelligence Pub Date : 2024-12-02 DOI:10.1109/TAI.2024.3510479

Yifan Xu;Pourya Shamsolmoali;Masoume Zareapoor;Jie Yang

{"title":"ClusVPR: Efficient Visual Place Recognition With Clustering-Based Weighted Transformer","authors":"Yifan Xu;Pourya Shamsolmoali;Masoume Zareapoor;Jie Yang","doi":"10.1109/TAI.2024.3510479","DOIUrl":null,"url":null,"abstract":"Visual place recognition (VPR) is a highly challenging task that has a wide range of applications, including robot navigation and self-driving vehicles. VPR is a difficult task due to duplicate regions and insufficient attention to small objects in complex scenes, resulting in recognition deviations. In this article, we present ClusVPR, a novel approach that tackles the specific issues of redundant information in duplicate regions and representations of small objects. Different from existing methods that rely on convolutional neural networks (CNNs) for feature map generation, ClusVPR introduces a unique paradigm called clustering-based weighted transformer network (CWTNet). CWTNet uses the power of clustering-based weighted feature maps and integrates global dependencies to effectively address visual deviations encountered in large-scale VPR problems. We also introduce the optimized-VLAD (OptLAD) layer, which significantly reduces the number of parameters and enhances model efficiency. This layer is specifically designed to aggregate the information obtained from scale-wise image patches. Additionally, our pyramid self-supervised strategy focuses on extracting representative and diverse features from scale-wise image patches rather than from entire images. This approach is essential for capturing a broader range of information required for robust VPR. Extensive experiments on four VPR datasets show our model's superior performance compared to existing models while being less complex.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 4","pages":"1038-1049"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10772618/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Visual place recognition (VPR) is a highly challenging task that has a wide range of applications, including robot navigation and self-driving vehicles. VPR is a difficult task due to duplicate regions and insufficient attention to small objects in complex scenes, resulting in recognition deviations. In this article, we present ClusVPR, a novel approach that tackles the specific issues of redundant information in duplicate regions and representations of small objects. Different from existing methods that rely on convolutional neural networks (CNNs) for feature map generation, ClusVPR introduces a unique paradigm called clustering-based weighted transformer network (CWTNet). CWTNet uses the power of clustering-based weighted feature maps and integrates global dependencies to effectively address visual deviations encountered in large-scale VPR problems. We also introduce the optimized-VLAD (OptLAD) layer, which significantly reduces the number of parameters and enhances model efficiency. This layer is specifically designed to aggregate the information obtained from scale-wise image patches. Additionally, our pyramid self-supervised strategy focuses on extracting representative and diverse features from scale-wise image patches rather than from entire images. This approach is essential for capturing a broader range of information required for robust VPR. Extensive experiments on four VPR datasets show our model's superior performance compared to existing models while being less complex.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on artificial intelligence

CiteScore

7.70

自引率

0.00%

发文量