RenKai Xiao, ShengZhi Yuan, Kai Jin, Min Li, Yan Tang, Sen Shen
{"title":"CMFNet: A Three-Stage Feature Matching Network With Geometric Consistency and Attentional Enhancement","authors":"RenKai Xiao, ShengZhi Yuan, Kai Jin, Min Li, Yan Tang, Sen Shen","doi":"10.1049/ipr2.70050","DOIUrl":null,"url":null,"abstract":"<p>Current feature matching methods typically employ a two-stage process, consisting of coarse and fine matching. However, the transition from the coarse to the fine stage often lacks an effective intermediate state, leading to abrupt changes in the matching process. This can hinder smooth transitions and precise localization. To address these limitations, this study introduces Coarse-Mid-Fine Match Net (CMFNet), a novel three-stage image feature matching method. CMFNet incorporates an intermediate-grained matching phase between the coarse and fine stages to facilitate a more gradual and seamless transition. In the proposed method, the intermediate-grained matching refines the correspondences obtained from the coarse-grained stage using Adaptive-random sample consensus (RANSAC). Subsequently, the midtransformer, which integrates sparse self-attention (SSA) mechanisms with local-feature-based cross-attention, is employed for feature extraction. This approach enhances the feature extraction capabilities and improves the adaptability to various types of image data, thereby boosting overall matching performance. Additionally, a cross-attention mechanism based on local region features is introduced. The network undergoes fully self-supervised training, aiming to minimize a match loss that is autonomously generated from the training data using a multi-scale cross-entropy method. A series of thorough experiments was carried out on diverse real-world datasets, including both unaltered and extensively processed images.The results demonstrate that the proposed method outperforms state-of-the-art approaches, achieving 0.776 mAUC on the HPatches dataset and 0.442 mAUC on the ISC-HE dataset.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70050","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Image Processing","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/ipr2.70050","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Current feature matching methods typically employ a two-stage process, consisting of coarse and fine matching. However, the transition from the coarse to the fine stage often lacks an effective intermediate state, leading to abrupt changes in the matching process. This can hinder smooth transitions and precise localization. To address these limitations, this study introduces Coarse-Mid-Fine Match Net (CMFNet), a novel three-stage image feature matching method. CMFNet incorporates an intermediate-grained matching phase between the coarse and fine stages to facilitate a more gradual and seamless transition. In the proposed method, the intermediate-grained matching refines the correspondences obtained from the coarse-grained stage using Adaptive-random sample consensus (RANSAC). Subsequently, the midtransformer, which integrates sparse self-attention (SSA) mechanisms with local-feature-based cross-attention, is employed for feature extraction. This approach enhances the feature extraction capabilities and improves the adaptability to various types of image data, thereby boosting overall matching performance. Additionally, a cross-attention mechanism based on local region features is introduced. The network undergoes fully self-supervised training, aiming to minimize a match loss that is autonomously generated from the training data using a multi-scale cross-entropy method. A series of thorough experiments was carried out on diverse real-world datasets, including both unaltered and extensively processed images.The results demonstrate that the proposed method outperforms state-of-the-art approaches, achieving 0.776 mAUC on the HPatches dataset and 0.442 mAUC on the ISC-HE dataset.
期刊介绍:
The IET Image Processing journal encompasses research areas related to the generation, processing and communication of visual information. The focus of the journal is the coverage of the latest research results in image and video processing, including image generation and display, enhancement and restoration, segmentation, colour and texture analysis, coding and communication, implementations and architectures as well as innovative applications.
Principal topics include:
Generation and Display - Imaging sensors and acquisition systems, illumination, sampling and scanning, quantization, colour reproduction, image rendering, display and printing systems, evaluation of image quality.
Processing and Analysis - Image enhancement, restoration, segmentation, registration, multispectral, colour and texture processing, multiresolution processing and wavelets, morphological operations, stereoscopic and 3-D processing, motion detection and estimation, video and image sequence processing.
Implementations and Architectures - Image and video processing hardware and software, design and construction, architectures and software, neural, adaptive, and fuzzy processing.
Coding and Transmission - Image and video compression and coding, compression standards, noise modelling, visual information networks, streamed video.
Retrieval and Multimedia - Storage of images and video, database design, image retrieval, video annotation and editing, mixed media incorporating visual information, multimedia systems and applications, image and video watermarking, steganography.
Applications - Innovative application of image and video processing technologies to any field, including life sciences, earth sciences, astronomy, document processing and security.
Current Special Issue Call for Papers:
Evolutionary Computation for Image Processing - https://digital-library.theiet.org/files/IET_IPR_CFP_EC.pdf
AI-Powered 3D Vision - https://digital-library.theiet.org/files/IET_IPR_CFP_AIPV.pdf
Multidisciplinary advancement of Imaging Technologies: From Medical Diagnostics and Genomics to Cognitive Machine Vision, and Artificial Intelligence - https://digital-library.theiet.org/files/IET_IPR_CFP_IST.pdf
Deep Learning for 3D Reconstruction - https://digital-library.theiet.org/files/IET_IPR_CFP_DLR.pdf