HVLF: A Holistic Visual Localization Framework Across Diverse Scenes.

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems Pub Date : 2025-10-01 DOI:10.1109/TNNLS.2025.3580405

Kun Dai, Zhiqiang Jiang, Fuyuan Qiu, Dedong Liu, Tao Xie, Ke Wang, Ruifeng Li, Lijun Zhao

{"title":"HVLF: A Holistic Visual Localization Framework Across Diverse Scenes.","authors":"Kun Dai, Zhiqiang Jiang, Fuyuan Qiu, Dedong Liu, Tao Xie, Ke Wang, Ruifeng Li, Lijun Zhao","doi":"10.1109/TNNLS.2025.3580405","DOIUrl":null,"url":null,"abstract":"<p><p>Recently, integrating the multitask learning (MTL) paradigm into scene coordinate regression (SCoRe) techniques has achieved significant success in visual localization tasks. However, the feature extraction ability of existing frameworks is inherently constrained by the rigid weight activation strategy, which prevents each layer from concurrently capturing scene-universal features across diverse scenes and scene-particular attributes unique to each individual scene. In addition, the straightforward network architecture further exacerbates the issue of insufficient feature representation. To address these limitations, we introduce HVLF, a holistic framework that ensures flexible identification of both scene-universal and scene-particular attributes while integrating various attention mechanisms to enhance feature representation effectively. Technically, for the first issue, HVLF proposes a soft weight activation strategy (SWAS) equipped with polyhedral convolution to concurrently optimize scene-shared and scene-specific weights within each layer, which facilitates sufficient discernment of both scene-universal features and scene-particular attributes, thereby boosting the network's capability for comprehensive scene perception. For the second issue, HVLF introduces a mixed attention perception module (MAPM) that incorporates channelwise, spatialwise, and elementwise attention mechanisms to perform multilevel feature fusion, hence extracting discriminative features to regress precise scene coordinates. Extensive experiments on indoor and outdoor datasets prove that HVLF realizes impressive localization performance. In addition, experiments conducted on 3-D object detection and feature matching tasks prove that the two proposed techniques are universal and can be seamlessly inserted into other methods.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":"18859-18873"},"PeriodicalIF":8.9000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/TNNLS.2025.3580405","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, integrating the multitask learning (MTL) paradigm into scene coordinate regression (SCoRe) techniques has achieved significant success in visual localization tasks. However, the feature extraction ability of existing frameworks is inherently constrained by the rigid weight activation strategy, which prevents each layer from concurrently capturing scene-universal features across diverse scenes and scene-particular attributes unique to each individual scene. In addition, the straightforward network architecture further exacerbates the issue of insufficient feature representation. To address these limitations, we introduce HVLF, a holistic framework that ensures flexible identification of both scene-universal and scene-particular attributes while integrating various attention mechanisms to enhance feature representation effectively. Technically, for the first issue, HVLF proposes a soft weight activation strategy (SWAS) equipped with polyhedral convolution to concurrently optimize scene-shared and scene-specific weights within each layer, which facilitates sufficient discernment of both scene-universal features and scene-particular attributes, thereby boosting the network's capability for comprehensive scene perception. For the second issue, HVLF introduces a mixed attention perception module (MAPM) that incorporates channelwise, spatialwise, and elementwise attention mechanisms to perform multilevel feature fusion, hence extracting discriminative features to regress precise scene coordinates. Extensive experiments on indoor and outdoor datasets prove that HVLF realizes impressive localization performance. In addition, experiments conducted on 3-D object detection and feature matching tasks prove that the two proposed techniques are universal and can be seamlessly inserted into other methods.

查看原文本刊更多论文

HVLF：跨不同场景的整体视觉定位框架。

近年来，将多任务学习（MTL）范式与场景坐标回归（SCoRe）技术相结合，在视觉定位任务中取得了显著的成功。然而，现有框架的特征提取能力受到刚性权重激活策略的固有约束，这使得每一层都无法同时捕获不同场景的场景通用特征和每个场景特有的场景特定属性。此外，简单的网络架构进一步加剧了特征表示不足的问题。为了解决这些限制，我们引入了HVLF，这是一个整体框架，可确保灵活识别场景通用和场景特定属性，同时集成各种注意机制以有效增强特征表示。技术上，在第一期中，HVLF提出了一种配备多面体卷积的软权激活策略（SWAS），在每一层内同时优化场景共享权和场景特定权，既能充分识别场景通用特征，又能充分识别场景特定属性，从而提高网络的综合场景感知能力。对于第二个问题，HVLF引入了混合注意感知模块（MAPM），该模块结合了通道、空间和元素的注意机制来进行多级特征融合，从而提取判别特征以回归精确的场景坐标。在室内和室外数据集上的大量实验证明，HVLF实现了令人印象深刻的定位性能。此外，在三维目标检测和特征匹配任务上的实验证明了这两种方法的通用性，可以无缝地插入到其他方法中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

CiteScore

23.80

自引率

9.60%

发文量

2102

审稿时长

3-8 weeks

期刊介绍： The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.