{"title":"一个多层处理和粗过滤网络,用于精确的特征匹配","authors":"Yuan Guo , Wenpeng Li , Ping Zhai","doi":"10.1016/j.patcog.2025.112464","DOIUrl":null,"url":null,"abstract":"<div><div>The core task of feature matching is establishing correspondences between two images. The methods based on Transformers have achieved impressive results, which can directly capture the relationships among all features without relying on the distances between them. However, it also reduce the weight of long-distance texture features and ignore simultaneous integration of global, local, and multi-scale features, leading to limited matching accuracy. To address this issue, we propose a detector-free feature matching method based on Transformer with multi-level processing and coarse-grained filtering. First, we apply a local window aggregation module to minimize irrelevant interference through window attention and combine local self-attention with global self-attention to ensure the features have a global perspective but not lose local details. Then, the multi-scale features are processed in layers, integrating multi-scale information into the matching phase, allowing each layer to perform feature matching at different scales for more precise matches. Additionally, we designed a filter to discard incorrectly matched points in the global context, thereby improving the accuracy of the matching points. Extensive experiments demonstrate that our method delivers excellent results comparing with the current state-of-the-art techniques in the tasks of pose estimation, homography estimation, and visual localization. Compared with the baseline method LoFTR, our method achieves an average improvement of 16.07 % in pose estimation, 6.52 % in homography estimation, and 9.69 % in visual localization. Meanwhile, our method also demonstrates superior performance compared to other state-of-the-art feature matching approaches.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112464"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A multi-layer processing and coarse filtering network for accurate feature matching\",\"authors\":\"Yuan Guo , Wenpeng Li , Ping Zhai\",\"doi\":\"10.1016/j.patcog.2025.112464\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The core task of feature matching is establishing correspondences between two images. The methods based on Transformers have achieved impressive results, which can directly capture the relationships among all features without relying on the distances between them. However, it also reduce the weight of long-distance texture features and ignore simultaneous integration of global, local, and multi-scale features, leading to limited matching accuracy. To address this issue, we propose a detector-free feature matching method based on Transformer with multi-level processing and coarse-grained filtering. First, we apply a local window aggregation module to minimize irrelevant interference through window attention and combine local self-attention with global self-attention to ensure the features have a global perspective but not lose local details. Then, the multi-scale features are processed in layers, integrating multi-scale information into the matching phase, allowing each layer to perform feature matching at different scales for more precise matches. Additionally, we designed a filter to discard incorrectly matched points in the global context, thereby improving the accuracy of the matching points. Extensive experiments demonstrate that our method delivers excellent results comparing with the current state-of-the-art techniques in the tasks of pose estimation, homography estimation, and visual localization. Compared with the baseline method LoFTR, our method achieves an average improvement of 16.07 % in pose estimation, 6.52 % in homography estimation, and 9.69 % in visual localization. Meanwhile, our method also demonstrates superior performance compared to other state-of-the-art feature matching approaches.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"172 \",\"pages\":\"Article 112464\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325011276\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325011276","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A multi-layer processing and coarse filtering network for accurate feature matching
The core task of feature matching is establishing correspondences between two images. The methods based on Transformers have achieved impressive results, which can directly capture the relationships among all features without relying on the distances between them. However, it also reduce the weight of long-distance texture features and ignore simultaneous integration of global, local, and multi-scale features, leading to limited matching accuracy. To address this issue, we propose a detector-free feature matching method based on Transformer with multi-level processing and coarse-grained filtering. First, we apply a local window aggregation module to minimize irrelevant interference through window attention and combine local self-attention with global self-attention to ensure the features have a global perspective but not lose local details. Then, the multi-scale features are processed in layers, integrating multi-scale information into the matching phase, allowing each layer to perform feature matching at different scales for more precise matches. Additionally, we designed a filter to discard incorrectly matched points in the global context, thereby improving the accuracy of the matching points. Extensive experiments demonstrate that our method delivers excellent results comparing with the current state-of-the-art techniques in the tasks of pose estimation, homography estimation, and visual localization. Compared with the baseline method LoFTR, our method achieves an average improvement of 16.07 % in pose estimation, 6.52 % in homography estimation, and 9.69 % in visual localization. Meanwhile, our method also demonstrates superior performance compared to other state-of-the-art feature matching approaches.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.