Kun Dai, Zhiqiang Jiang, Fuyuan Qiu, Dedong Liu, Tao Xie, Ke Wang, Ruifeng Li, Lijun Zhao
{"title":"HVLF: A Holistic Visual Localization Framework Across Diverse Scenes.","authors":"Kun Dai, Zhiqiang Jiang, Fuyuan Qiu, Dedong Liu, Tao Xie, Ke Wang, Ruifeng Li, Lijun Zhao","doi":"10.1109/TNNLS.2025.3580405","DOIUrl":null,"url":null,"abstract":"<p><p>Recently, integrating the multitask learning (MTL) paradigm into scene coordinate regression (SCoRe) techniques has achieved significant success in visual localization tasks. However, the feature extraction ability of existing frameworks is inherently constrained by the rigid weight activation strategy, which prevents each layer from concurrently capturing scene-universal features across diverse scenes and scene-particular attributes unique to each individual scene. In addition, the straightforward network architecture further exacerbates the issue of insufficient feature representation. To address these limitations, we introduce HVLF, a holistic framework that ensures flexible identification of both scene-universal and scene-particular attributes while integrating various attention mechanisms to enhance feature representation effectively. Technically, for the first issue, HVLF proposes a soft weight activation strategy (SWAS) equipped with polyhedral convolution to concurrently optimize scene-shared and scene-specific weights within each layer, which facilitates sufficient discernment of both scene-universal features and scene-particular attributes, thereby boosting the network's capability for comprehensive scene perception. For the second issue, HVLF introduces a mixed attention perception module (MAPM) that incorporates channelwise, spatialwise, and elementwise attention mechanisms to perform multilevel feature fusion, hence extracting discriminative features to regress precise scene coordinates. Extensive experiments on indoor and outdoor datasets prove that HVLF realizes impressive localization performance. In addition, experiments conducted on 3-D object detection and feature matching tasks prove that the two proposed techniques are universal and can be seamlessly inserted into other methods.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":"18859-18873"},"PeriodicalIF":8.9000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/TNNLS.2025.3580405","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, integrating the multitask learning (MTL) paradigm into scene coordinate regression (SCoRe) techniques has achieved significant success in visual localization tasks. However, the feature extraction ability of existing frameworks is inherently constrained by the rigid weight activation strategy, which prevents each layer from concurrently capturing scene-universal features across diverse scenes and scene-particular attributes unique to each individual scene. In addition, the straightforward network architecture further exacerbates the issue of insufficient feature representation. To address these limitations, we introduce HVLF, a holistic framework that ensures flexible identification of both scene-universal and scene-particular attributes while integrating various attention mechanisms to enhance feature representation effectively. Technically, for the first issue, HVLF proposes a soft weight activation strategy (SWAS) equipped with polyhedral convolution to concurrently optimize scene-shared and scene-specific weights within each layer, which facilitates sufficient discernment of both scene-universal features and scene-particular attributes, thereby boosting the network's capability for comprehensive scene perception. For the second issue, HVLF introduces a mixed attention perception module (MAPM) that incorporates channelwise, spatialwise, and elementwise attention mechanisms to perform multilevel feature fusion, hence extracting discriminative features to regress precise scene coordinates. Extensive experiments on indoor and outdoor datasets prove that HVLF realizes impressive localization performance. In addition, experiments conducted on 3-D object detection and feature matching tasks prove that the two proposed techniques are universal and can be seamlessly inserted into other methods.
期刊介绍:
The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.