DeMatch++：基于深度运动场分解和局部上下文聚合的双视图对应学习。

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-08-07 DOI:10.1109/tpami.2025.3596598

Shihua Zhang,Zizhuo Li,Jiayi Ma

{"title":"DeMatch++：基于深度运动场分解和局部上下文聚合的双视图对应学习。","authors":"Shihua Zhang,Zizhuo Li,Jiayi Ma","doi":"10.1109/tpami.2025.3596598","DOIUrl":null,"url":null,"abstract":"Two-view correspondence learning has increasingly focused on the coherence and smoothness of motion fields between image pairs. Conventional methods either regularize the complexity of the field function at substantial computational expense, or apply local filters that prove ineffective for large scene disparities. In this paper, we present DeMatch++, a novel network drawing inspiration from Fourier decomposition principles that decomposes the motion field to retain its primary \"low-frequency\" and smooth components. This approach achieves implicit regularization with lower computational overhead while exhibiting inherent piecewise smoothness. Specifically, our method decomposes the noise-contaminated motion field into multiple linearly independent basis vectors, generating smooth sub-fields that preserve the main energy of the original field. These sub-fields facilitate the recovery of a cleaner motion field for precise vector derivation. Within this framework, we aggregate local context within each sub-field while enhancing global information across all sub-fields. We also employ a masked decomposition strategy that mitigates the influence of false matches, and construct a compact representation to suppress redundant sub-fields. The complete pipeline is formulated as a discrete learnable architecture, circumventing the need for dense field computation. Extensive experiments demonstrate that DeMatch++ outperforms state-of-the-art methods while maintaining computational efficiency and piecewise smoothness. The code and trained models are publicly available at https://github.com/SuhZhang/DeMatchPlus.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"31 1","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DeMatch++: Two-View Correspondence Learning Via Deep Motion Field Decomposition and Respective Local-Context Aggregation.\",\"authors\":\"Shihua Zhang,Zizhuo Li,Jiayi Ma\",\"doi\":\"10.1109/tpami.2025.3596598\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Two-view correspondence learning has increasingly focused on the coherence and smoothness of motion fields between image pairs. Conventional methods either regularize the complexity of the field function at substantial computational expense, or apply local filters that prove ineffective for large scene disparities. In this paper, we present DeMatch++, a novel network drawing inspiration from Fourier decomposition principles that decomposes the motion field to retain its primary \\\"low-frequency\\\" and smooth components. This approach achieves implicit regularization with lower computational overhead while exhibiting inherent piecewise smoothness. Specifically, our method decomposes the noise-contaminated motion field into multiple linearly independent basis vectors, generating smooth sub-fields that preserve the main energy of the original field. These sub-fields facilitate the recovery of a cleaner motion field for precise vector derivation. Within this framework, we aggregate local context within each sub-field while enhancing global information across all sub-fields. We also employ a masked decomposition strategy that mitigates the influence of false matches, and construct a compact representation to suppress redundant sub-fields. The complete pipeline is formulated as a discrete learnable architecture, circumventing the need for dense field computation. Extensive experiments demonstrate that DeMatch++ outperforms state-of-the-art methods while maintaining computational efficiency and piecewise smoothness. The code and trained models are publicly available at https://github.com/SuhZhang/DeMatchPlus.\",\"PeriodicalId\":13426,\"journal\":{\"name\":\"IEEE Transactions on Pattern Analysis and Machine Intelligence\",\"volume\":\"31 1\",\"pages\":\"\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Pattern Analysis and Machine Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/tpami.2025.3596598\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tpami.2025.3596598","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

双视图对应学习越来越关注图像对之间运动场的一致性和平滑性。传统的方法要么正则化复杂的场函数，在大量的计算费用，或应用局部滤波器，证明无效的大场景差异。在本文中，我们提出了DeMatch++，这是一种从傅里叶分解原理中汲取灵感的新型网络，它可以分解运动场以保留其主要的“低频”和平滑分量。该方法以较低的计算开销实现隐式正则化，同时表现出固有的分段平滑性。具体来说，我们的方法将受噪声污染的运动场分解成多个线性无关的基向量，生成光滑的子场，保留了原始场的主要能量。这些子场有助于恢复更清晰的运动场，以进行精确的矢量推导。在这个框架中，我们在每个子字段中聚合本地上下文，同时在所有子字段中增强全局信息。我们还采用了一种屏蔽分解策略来减轻错误匹配的影响，并构造了一个紧凑的表示来抑制冗余子字段。完整的管道被表述为一个离散的可学习架构，避免了对密集场计算的需要。大量的实验表明，DeMatch++在保持计算效率和分段平滑的同时，优于最先进的方法。代码和经过训练的模型可在https://github.com/SuhZhang/DeMatchPlus上公开获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DeMatch++: Two-View Correspondence Learning Via Deep Motion Field Decomposition and Respective Local-Context Aggregation.

Two-view correspondence learning has increasingly focused on the coherence and smoothness of motion fields between image pairs. Conventional methods either regularize the complexity of the field function at substantial computational expense, or apply local filters that prove ineffective for large scene disparities. In this paper, we present DeMatch++, a novel network drawing inspiration from Fourier decomposition principles that decomposes the motion field to retain its primary "low-frequency" and smooth components. This approach achieves implicit regularization with lower computational overhead while exhibiting inherent piecewise smoothness. Specifically, our method decomposes the noise-contaminated motion field into multiple linearly independent basis vectors, generating smooth sub-fields that preserve the main energy of the original field. These sub-fields facilitate the recovery of a cleaner motion field for precise vector derivation. Within this framework, we aggregate local context within each sub-field while enhancing global information across all sub-fields. We also employ a masked decomposition strategy that mitigates the influence of false matches, and construct a compact representation to suppress redundant sub-fields. The complete pipeline is formulated as a discrete learnable architecture, circumventing the need for dense field computation. Extensive experiments demonstrate that DeMatch++ outperforms state-of-the-art methods while maintaining computational efficiency and piecewise smoothness. The code and trained models are publicly available at https://github.com/SuhZhang/DeMatchPlus.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Pattern Analysis and Machine Intelligence 工程技术-工程：电子与电气

CiteScore

28.40

自引率

3.00%

发文量

885

审稿时长

8.5 months

期刊介绍： The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.