Enhancing Semantic Information Representation in Multi-View Geo-Localization through Dual-Branch Network with Feature Consistency Enhancement and Multi-Level Feature Mining

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Image Processing Pub Date : 2025-04-21 DOI:10.1049/ipr2.70071

Yang Zheng, Qing Li, Jiangyun Li, Zhenghao Xi, Jie Liu

{"title":"Enhancing Semantic Information Representation in Multi-View Geo-Localization through Dual-Branch Network with Feature Consistency Enhancement and Multi-Level Feature Mining","authors":"Yang Zheng, Qing Li, Jiangyun Li, Zhenghao Xi, Jie Liu","doi":"10.1049/ipr2.70071","DOIUrl":null,"url":null,"abstract":"<p>Metric learning is fundamental to multi-view geo-localization, as it aims to establish a distance metric that minimizes the feature space distance between similar data points while maximizing the separation between dissimilar ones. However, in Siamese networks employed for metric learning, individual branches may exhibit discrepancies in their interpretation of semantic information from input data, resulting in semantically inconsistent feature representations. To address this issue, a method is designed to enhance significant region consistency within multi-view spaces by integrating feature consistency enhancement (FCE) and multi-level feature mining (MLFM) techniques into a dual-branch network. The FCE method emphasizes critical components of the input data, ensuring feature consistency between the two branches. Additionally, the MLFM mechanism facilitates feature integration across multiple levels, thereby enabling a more comprehensive extraction of semantic information. This approach enhances semantic understanding and promotes feature consistency across branches. The proposed method achieves AP values of 82.38% for drone-to-satellite and 77.36% for satellite-to-drone image matching. Notably, the method maintains computational efficiency without significantly affecting inference time. Additionally, improvements are observed in R@1, R@5 and R@10 metrics. The experimental results show that integrating FCE and MLFM into the dual-branch network improves semantic representation and outperforms existing methods.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70071","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Image Processing","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/ipr2.70071","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Metric learning is fundamental to multi-view geo-localization, as it aims to establish a distance metric that minimizes the feature space distance between similar data points while maximizing the separation between dissimilar ones. However, in Siamese networks employed for metric learning, individual branches may exhibit discrepancies in their interpretation of semantic information from input data, resulting in semantically inconsistent feature representations. To address this issue, a method is designed to enhance significant region consistency within multi-view spaces by integrating feature consistency enhancement (FCE) and multi-level feature mining (MLFM) techniques into a dual-branch network. The FCE method emphasizes critical components of the input data, ensuring feature consistency between the two branches. Additionally, the MLFM mechanism facilitates feature integration across multiple levels, thereby enabling a more comprehensive extraction of semantic information. This approach enhances semantic understanding and promotes feature consistency across branches. The proposed method achieves AP values of 82.38% for drone-to-satellite and 77.36% for satellite-to-drone image matching. Notably, the method maintains computational efficiency without significantly affecting inference time. Additionally, improvements are observed in R@1, R@5 and R@10 metrics. The experimental results show that integrating FCE and MLFM into the dual-branch network improves semantic representation and outperforms existing methods.

Abstract Image

查看原文本刊更多论文

通过双分支网络增强特征一致性和多层次特征挖掘，加强多视图地理定位中的语义信息表征

公因子学习是多视角地理定位的基础，因为它旨在建立一个距离公因子，使相似数据点之间的特征空间距离最小化，同时使不同数据点之间的分离度最大化。然而，在用于度量学习的连体网络中，各个分支在解释输入数据的语义信息时可能会表现出差异，从而导致语义上不一致的特征表示。为解决这一问题，我们设计了一种方法，通过将特征一致性增强（FCE）和多层次特征挖掘（MLFM）技术整合到双分支网络中，增强多视角空间内重要区域的一致性。FCE 方法强调输入数据的关键部分，确保两个分支之间的特征一致性。此外，MLFM 机制可促进多层次的特征整合，从而实现更全面的语义信息提取。这种方法增强了语义理解，促进了各分支之间的特征一致性。所提出的方法在无人机到卫星和卫星到无人机的图像匹配中分别达到了 82.38% 和 77.36% 的 AP 值。值得注意的是，该方法在不显著影响推理时间的情况下保持了计算效率。此外，还观察到 R@1、R@5 和 R@10 指标的改进。实验结果表明，将 FCE 和 MLFM 集成到双分支网络中可以改善语义表示，并优于现有方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IET Image Processing 工程技术-工程：电子与电气

CiteScore

5.40

自引率

8.70%

发文量

282

审稿时长

6 months

期刊介绍： The IET Image Processing journal encompasses research areas related to the generation, processing and communication of visual information. The focus of the journal is the coverage of the latest research results in image and video processing, including image generation and display, enhancement and restoration, segmentation, colour and texture analysis, coding and communication, implementations and architectures as well as innovative applications. Principal topics include: Generation and Display - Imaging sensors and acquisition systems, illumination, sampling and scanning, quantization, colour reproduction, image rendering, display and printing systems, evaluation of image quality. Processing and Analysis - Image enhancement, restoration, segmentation, registration, multispectral, colour and texture processing, multiresolution processing and wavelets, morphological operations, stereoscopic and 3-D processing, motion detection and estimation, video and image sequence processing. Implementations and Architectures - Image and video processing hardware and software, design and construction, architectures and software, neural, adaptive, and fuzzy processing. Coding and Transmission - Image and video compression and coding, compression standards, noise modelling, visual information networks, streamed video. Retrieval and Multimedia - Storage of images and video, database design, image retrieval, video annotation and editing, mixed media incorporating visual information, multimedia systems and applications, image and video watermarking, steganography. Applications - Innovative application of image and video processing technologies to any field, including life sciences, earth sciences, astronomy, document processing and security. Current Special Issue Call for Papers: Evolutionary Computation for Image Processing - https://digital-library.theiet.org/files/IET_IPR_CFP_EC.pdf AI-Powered 3D Vision - https://digital-library.theiet.org/files/IET_IPR_CFP_AIPV.pdf Multidisciplinary advancement of Imaging Technologies: From Medical Diagnostics and Genomics to Cognitive Machine Vision, and Artificial Intelligence - https://digital-library.theiet.org/files/IET_IPR_CFP_IST.pdf Deep Learning for 3D Reconstruction - https://digital-library.theiet.org/files/IET_IPR_CFP_DLR.pdf