一个多层处理和粗过滤网络，用于精确的特征匹配

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2025-09-24 DOI:10.1016/j.patcog.2025.112464

Yuan Guo , Wenpeng Li , Ping Zhai

{"title":"一个多层处理和粗过滤网络，用于精确的特征匹配","authors":"Yuan Guo , Wenpeng Li , Ping Zhai","doi":"10.1016/j.patcog.2025.112464","DOIUrl":null,"url":null,"abstract":"<div><div>The core task of feature matching is establishing correspondences between two images. The methods based on Transformers have achieved impressive results, which can directly capture the relationships among all features without relying on the distances between them. However, it also reduce the weight of long-distance texture features and ignore simultaneous integration of global, local, and multi-scale features, leading to limited matching accuracy. To address this issue, we propose a detector-free feature matching method based on Transformer with multi-level processing and coarse-grained filtering. First, we apply a local window aggregation module to minimize irrelevant interference through window attention and combine local self-attention with global self-attention to ensure the features have a global perspective but not lose local details. Then, the multi-scale features are processed in layers, integrating multi-scale information into the matching phase, allowing each layer to perform feature matching at different scales for more precise matches. Additionally, we designed a filter to discard incorrectly matched points in the global context, thereby improving the accuracy of the matching points. Extensive experiments demonstrate that our method delivers excellent results comparing with the current state-of-the-art techniques in the tasks of pose estimation, homography estimation, and visual localization. Compared with the baseline method LoFTR, our method achieves an average improvement of 16.07 % in pose estimation, 6.52 % in homography estimation, and 9.69 % in visual localization. Meanwhile, our method also demonstrates superior performance compared to other state-of-the-art feature matching approaches.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112464"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A multi-layer processing and coarse filtering network for accurate feature matching\",\"authors\":\"Yuan Guo , Wenpeng Li , Ping Zhai\",\"doi\":\"10.1016/j.patcog.2025.112464\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The core task of feature matching is establishing correspondences between two images. The methods based on Transformers have achieved impressive results, which can directly capture the relationships among all features without relying on the distances between them. However, it also reduce the weight of long-distance texture features and ignore simultaneous integration of global, local, and multi-scale features, leading to limited matching accuracy. To address this issue, we propose a detector-free feature matching method based on Transformer with multi-level processing and coarse-grained filtering. First, we apply a local window aggregation module to minimize irrelevant interference through window attention and combine local self-attention with global self-attention to ensure the features have a global perspective but not lose local details. Then, the multi-scale features are processed in layers, integrating multi-scale information into the matching phase, allowing each layer to perform feature matching at different scales for more precise matches. Additionally, we designed a filter to discard incorrectly matched points in the global context, thereby improving the accuracy of the matching points. Extensive experiments demonstrate that our method delivers excellent results comparing with the current state-of-the-art techniques in the tasks of pose estimation, homography estimation, and visual localization. Compared with the baseline method LoFTR, our method achieves an average improvement of 16.07 % in pose estimation, 6.52 % in homography estimation, and 9.69 % in visual localization. Meanwhile, our method also demonstrates superior performance compared to other state-of-the-art feature matching approaches.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"172 \",\"pages\":\"Article 112464\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325011276\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325011276","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

特征匹配的核心任务是建立两幅图像之间的对应关系。基于变形金刚的方法取得了令人印象深刻的效果，它可以直接捕获所有特征之间的关系，而不依赖于它们之间的距离。然而，它也降低了长距离纹理特征的权重，忽略了全局、局部和多尺度特征的同时融合，导致匹配精度有限。为了解决这一问题，我们提出了一种基于Transformer的无检测器特征匹配方法，该方法采用多级处理和粗粒度过滤。首先，我们采用局部窗口聚合模块，通过窗口关注最小化无关干扰，并将局部自关注与全局自关注相结合，确保特征具有全局视角而不丢失局部细节。然后，对多尺度特征进行分层处理，将多尺度信息整合到匹配阶段，每一层进行不同尺度的特征匹配，实现更精确的匹配。此外，我们还设计了一个过滤器来丢弃全局上下文中不正确匹配的点，从而提高了匹配点的准确性。大量的实验表明，与当前最先进的技术相比，我们的方法在姿态估计、单应性估计和视觉定位任务中提供了出色的结果。与基线方法LoFTR相比，我们的方法在位姿估计上平均提高了16.07%，在单应性估计上平均提高了6.52%，在视觉定位上平均提高了9.69%。同时，与其他最先进的特征匹配方法相比，我们的方法也显示出优越的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A multi-layer processing and coarse filtering network for accurate feature matching

The core task of feature matching is establishing correspondences between two images. The methods based on Transformers have achieved impressive results, which can directly capture the relationships among all features without relying on the distances between them. However, it also reduce the weight of long-distance texture features and ignore simultaneous integration of global, local, and multi-scale features, leading to limited matching accuracy. To address this issue, we propose a detector-free feature matching method based on Transformer with multi-level processing and coarse-grained filtering. First, we apply a local window aggregation module to minimize irrelevant interference through window attention and combine local self-attention with global self-attention to ensure the features have a global perspective but not lose local details. Then, the multi-scale features are processed in layers, integrating multi-scale information into the matching phase, allowing each layer to perform feature matching at different scales for more precise matches. Additionally, we designed a filter to discard incorrectly matched points in the global context, thereby improving the accuracy of the matching points. Extensive experiments demonstrate that our method delivers excellent results comparing with the current state-of-the-art techniques in the tasks of pose estimation, homography estimation, and visual localization. Compared with the baseline method LoFTR, our method achieves an average improvement of 16.07 % in pose estimation, 6.52 % in homography estimation, and 9.69 % in visual localization. Meanwhile, our method also demonstrates superior performance compared to other state-of-the-art feature matching approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.