Yifan Xu;Pourya Shamsolmoali;Masoume Zareapoor;Jie Yang
{"title":"ClusVPR: Efficient Visual Place Recognition With Clustering-Based Weighted Transformer","authors":"Yifan Xu;Pourya Shamsolmoali;Masoume Zareapoor;Jie Yang","doi":"10.1109/TAI.2024.3510479","DOIUrl":null,"url":null,"abstract":"Visual place recognition (VPR) is a highly challenging task that has a wide range of applications, including robot navigation and self-driving vehicles. VPR is a difficult task due to duplicate regions and insufficient attention to small objects in complex scenes, resulting in recognition deviations. In this article, we present ClusVPR, a novel approach that tackles the specific issues of redundant information in duplicate regions and representations of small objects. Different from existing methods that rely on convolutional neural networks (CNNs) for feature map generation, ClusVPR introduces a unique paradigm called clustering-based weighted transformer network (CWTNet). CWTNet uses the power of clustering-based weighted feature maps and integrates global dependencies to effectively address visual deviations encountered in large-scale VPR problems. We also introduce the optimized-VLAD (OptLAD) layer, which significantly reduces the number of parameters and enhances model efficiency. This layer is specifically designed to aggregate the information obtained from scale-wise image patches. Additionally, our pyramid self-supervised strategy focuses on extracting representative and diverse features from scale-wise image patches rather than from entire images. This approach is essential for capturing a broader range of information required for robust VPR. Extensive experiments on four VPR datasets show our model's superior performance compared to existing models while being less complex.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 4","pages":"1038-1049"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10772618/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Visual place recognition (VPR) is a highly challenging task that has a wide range of applications, including robot navigation and self-driving vehicles. VPR is a difficult task due to duplicate regions and insufficient attention to small objects in complex scenes, resulting in recognition deviations. In this article, we present ClusVPR, a novel approach that tackles the specific issues of redundant information in duplicate regions and representations of small objects. Different from existing methods that rely on convolutional neural networks (CNNs) for feature map generation, ClusVPR introduces a unique paradigm called clustering-based weighted transformer network (CWTNet). CWTNet uses the power of clustering-based weighted feature maps and integrates global dependencies to effectively address visual deviations encountered in large-scale VPR problems. We also introduce the optimized-VLAD (OptLAD) layer, which significantly reduces the number of parameters and enhances model efficiency. This layer is specifically designed to aggregate the information obtained from scale-wise image patches. Additionally, our pyramid self-supervised strategy focuses on extracting representative and diverse features from scale-wise image patches rather than from entire images. This approach is essential for capturing a broader range of information required for robust VPR. Extensive experiments on four VPR datasets show our model's superior performance compared to existing models while being less complex.