CMFNet: A Three-Stage Feature Matching Network With Geometric Consistency and Attentional Enhancement

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Image Processing Pub Date : 2025-03-27 DOI:10.1049/ipr2.70050

RenKai Xiao, ShengZhi Yuan, Kai Jin, Min Li, Yan Tang, Sen Shen

{"title":"CMFNet: A Three-Stage Feature Matching Network With Geometric Consistency and Attentional Enhancement","authors":"RenKai Xiao, ShengZhi Yuan, Kai Jin, Min Li, Yan Tang, Sen Shen","doi":"10.1049/ipr2.70050","DOIUrl":null,"url":null,"abstract":"<p>Current feature matching methods typically employ a two-stage process, consisting of coarse and fine matching. However, the transition from the coarse to the fine stage often lacks an effective intermediate state, leading to abrupt changes in the matching process. This can hinder smooth transitions and precise localization. To address these limitations, this study introduces Coarse-Mid-Fine Match Net (CMFNet), a novel three-stage image feature matching method. CMFNet incorporates an intermediate-grained matching phase between the coarse and fine stages to facilitate a more gradual and seamless transition. In the proposed method, the intermediate-grained matching refines the correspondences obtained from the coarse-grained stage using Adaptive-random sample consensus (RANSAC). Subsequently, the midtransformer, which integrates sparse self-attention (SSA) mechanisms with local-feature-based cross-attention, is employed for feature extraction. This approach enhances the feature extraction capabilities and improves the adaptability to various types of image data, thereby boosting overall matching performance. Additionally, a cross-attention mechanism based on local region features is introduced. The network undergoes fully self-supervised training, aiming to minimize a match loss that is autonomously generated from the training data using a multi-scale cross-entropy method. A series of thorough experiments was carried out on diverse real-world datasets, including both unaltered and extensively processed images.The results demonstrate that the proposed method outperforms state-of-the-art approaches, achieving 0.776 mAUC on the HPatches dataset and 0.442 mAUC on the ISC-HE dataset.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70050","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Image Processing","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/ipr2.70050","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Current feature matching methods typically employ a two-stage process, consisting of coarse and fine matching. However, the transition from the coarse to the fine stage often lacks an effective intermediate state, leading to abrupt changes in the matching process. This can hinder smooth transitions and precise localization. To address these limitations, this study introduces Coarse-Mid-Fine Match Net (CMFNet), a novel three-stage image feature matching method. CMFNet incorporates an intermediate-grained matching phase between the coarse and fine stages to facilitate a more gradual and seamless transition. In the proposed method, the intermediate-grained matching refines the correspondences obtained from the coarse-grained stage using Adaptive-random sample consensus (RANSAC). Subsequently, the midtransformer, which integrates sparse self-attention (SSA) mechanisms with local-feature-based cross-attention, is employed for feature extraction. This approach enhances the feature extraction capabilities and improves the adaptability to various types of image data, thereby boosting overall matching performance. Additionally, a cross-attention mechanism based on local region features is introduced. The network undergoes fully self-supervised training, aiming to minimize a match loss that is autonomously generated from the training data using a multi-scale cross-entropy method. A series of thorough experiments was carried out on diverse real-world datasets, including both unaltered and extensively processed images.The results demonstrate that the proposed method outperforms state-of-the-art approaches, achieving 0.776 mAUC on the HPatches dataset and 0.442 mAUC on the ISC-HE dataset.

Abstract Image

查看原文本刊更多论文

具有几何一致性和注意力增强的三阶段特征匹配网络

当前的特征匹配方法通常采用粗匹配和细匹配两阶段的过程。然而，从粗到细的过渡阶段往往缺乏有效的中间状态，导致匹配过程发生突变。这可能会阻碍平滑过渡和精确定位。为了解决这些局限性，本研究引入了一种新的三阶段图像特征匹配方法——粗-中-细匹配网络（CMFNet）。CMFNet在粗粒度和细粒度阶段之间结合了一个中间粒度的匹配阶段，以促进更渐进和无缝的过渡。在该方法中，中粒度匹配使用自适应随机样本一致性（RANSAC）对粗粒度阶段获得的对应进行细化。然后，将稀疏自注意（SSA）机制与基于局部特征的交叉注意机制相结合的中变压器进行特征提取。该方法增强了特征提取能力，提高了对各类图像数据的适应性，从而提高了整体匹配性能。此外，还引入了一种基于局部区域特征的交叉注意机制。该网络进行了完全自监督的训练，旨在使用多尺度交叉熵方法将训练数据自主生成的匹配损失最小化。在不同的真实世界数据集上进行了一系列彻底的实验，包括未改变的和经过广泛处理的图像。结果表明，该方法优于现有方法，在HPatches数据集上达到0.776 mAUC，在ISC-HE数据集上达到0.442 mAUC。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IET Image Processing 工程技术-工程：电子与电气

CiteScore

5.40

自引率

8.70%

发文量

282

审稿时长

6 months

期刊介绍： The IET Image Processing journal encompasses research areas related to the generation, processing and communication of visual information. The focus of the journal is the coverage of the latest research results in image and video processing, including image generation and display, enhancement and restoration, segmentation, colour and texture analysis, coding and communication, implementations and architectures as well as innovative applications. Principal topics include: Generation and Display - Imaging sensors and acquisition systems, illumination, sampling and scanning, quantization, colour reproduction, image rendering, display and printing systems, evaluation of image quality. Processing and Analysis - Image enhancement, restoration, segmentation, registration, multispectral, colour and texture processing, multiresolution processing and wavelets, morphological operations, stereoscopic and 3-D processing, motion detection and estimation, video and image sequence processing. Implementations and Architectures - Image and video processing hardware and software, design and construction, architectures and software, neural, adaptive, and fuzzy processing. Coding and Transmission - Image and video compression and coding, compression standards, noise modelling, visual information networks, streamed video. Retrieval and Multimedia - Storage of images and video, database design, image retrieval, video annotation and editing, mixed media incorporating visual information, multimedia systems and applications, image and video watermarking, steganography. Applications - Innovative application of image and video processing technologies to any field, including life sciences, earth sciences, astronomy, document processing and security. Current Special Issue Call for Papers: Evolutionary Computation for Image Processing - https://digital-library.theiet.org/files/IET_IPR_CFP_EC.pdf AI-Powered 3D Vision - https://digital-library.theiet.org/files/IET_IPR_CFP_AIPV.pdf Multidisciplinary advancement of Imaging Technologies: From Medical Diagnostics and Genomics to Cognitive Machine Vision, and Artificial Intelligence - https://digital-library.theiet.org/files/IET_IPR_CFP_IST.pdf Deep Learning for 3D Reconstruction - https://digital-library.theiet.org/files/IET_IPR_CFP_DLR.pdf