{"title":"用于视觉位置识别的寄存器辅助聚合","authors":"Xuan Yu, Zhenyong Fu","doi":"10.1016/j.jvcir.2024.104384","DOIUrl":null,"url":null,"abstract":"<div><div>Visual Place Recognition (VPR) refers to use computer vision to recognize the position of the current query image. Due to the significant changes in appearance caused by season, lighting, and time spans between query and database images, these differences increase the difficulty of place recognition. Previous approaches often discard irrelevant features (such as sky, roads and vehicles) as well as features that can enhance recognition accuracy (such as buildings and trees). To address this, we propose a novel feature aggregation method designed to preserve these critical features. Specifically, we introduce additional registers on top of the original image tokens to facilitate model training, enabling the extraction of both global and local features that contain discriminative place information. Once the attention weights are reallocated, these registers will be discarded. Experimental results demonstrate that our approach effectively separates unstable features from original image representation, and achieves superior performance compared to state-of-the-art methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104384"},"PeriodicalIF":2.6000,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Register assisted aggregation for visual place recognition\",\"authors\":\"Xuan Yu, Zhenyong Fu\",\"doi\":\"10.1016/j.jvcir.2024.104384\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Visual Place Recognition (VPR) refers to use computer vision to recognize the position of the current query image. Due to the significant changes in appearance caused by season, lighting, and time spans between query and database images, these differences increase the difficulty of place recognition. Previous approaches often discard irrelevant features (such as sky, roads and vehicles) as well as features that can enhance recognition accuracy (such as buildings and trees). To address this, we propose a novel feature aggregation method designed to preserve these critical features. Specifically, we introduce additional registers on top of the original image tokens to facilitate model training, enabling the extraction of both global and local features that contain discriminative place information. Once the attention weights are reallocated, these registers will be discarded. Experimental results demonstrate that our approach effectively separates unstable features from original image representation, and achieves superior performance compared to state-of-the-art methods.</div></div>\",\"PeriodicalId\":54755,\"journal\":{\"name\":\"Journal of Visual Communication and Image Representation\",\"volume\":\"107 \",\"pages\":\"Article 104384\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2024-12-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Visual Communication and Image Representation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1047320324003407\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320324003407","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Register assisted aggregation for visual place recognition
Visual Place Recognition (VPR) refers to use computer vision to recognize the position of the current query image. Due to the significant changes in appearance caused by season, lighting, and time spans between query and database images, these differences increase the difficulty of place recognition. Previous approaches often discard irrelevant features (such as sky, roads and vehicles) as well as features that can enhance recognition accuracy (such as buildings and trees). To address this, we propose a novel feature aggregation method designed to preserve these critical features. Specifically, we introduce additional registers on top of the original image tokens to facilitate model training, enabling the extraction of both global and local features that contain discriminative place information. Once the attention weights are reallocated, these registers will be discarded. Experimental results demonstrate that our approach effectively separates unstable features from original image representation, and achieves superior performance compared to state-of-the-art methods.
期刊介绍:
The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.