{"title":"CM2-STNet: Cross-modal image matching with modal-adaptive feature modulation and sparse transformer fusion","authors":"Zhizheng Zhang , Pengcheng Wei , Peilian Wu , Jindou Zhang , Boshen Chang , Zhenfeng Shao , Mingqiang Guo , Liang Wu , Jiayi Ma","doi":"10.1016/j.inffus.2025.103750","DOIUrl":null,"url":null,"abstract":"<div><div>Multimodal image matching is a fundamental task in geospatial analysis, aiming to establish accurate correspondences between images captured by heterogeneous imaging devices. However, significant geometric inconsistencies and nonlinear radiometric distortions lead to large distribution gaps, posing a major challenge for cross-modal matching. Moreover, existing methods often struggle to adaptively capture intra- and inter-modal features at multiple scales and to focus on semantically relevant regions in large-scale scenes. To address these issues, we propose a novel cross-modal image matching network called CM<sup>2</sup>-STNet. Specifically, we introduce a modal-adaptive feature modulation (MAFM) module that dynamically adjusts cross-modal feature representations at multiple scales, thereby enhancing semantic consistency between modalities. In addition, a cross-modal sparse transformer fusion (CM-STF) module is developed to guide the network to concentrate on the most relevant regions, where a Top-k selection mechanism is employed to retain discriminative features while filtering out irrelevant content. Extensive experiments on multimodal remote sensing datasets demonstrate that CM<sup>2</sup>-STNet achieves accurate and robust matching performance, validating its effectiveness and generalization ability in complex real-world scenarios. Code and pre-trained model are available at https://github.com/whuzzzz/CM<sup>2</sup>-STNet.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103750"},"PeriodicalIF":15.5000,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525008127","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Multimodal image matching is a fundamental task in geospatial analysis, aiming to establish accurate correspondences between images captured by heterogeneous imaging devices. However, significant geometric inconsistencies and nonlinear radiometric distortions lead to large distribution gaps, posing a major challenge for cross-modal matching. Moreover, existing methods often struggle to adaptively capture intra- and inter-modal features at multiple scales and to focus on semantically relevant regions in large-scale scenes. To address these issues, we propose a novel cross-modal image matching network called CM2-STNet. Specifically, we introduce a modal-adaptive feature modulation (MAFM) module that dynamically adjusts cross-modal feature representations at multiple scales, thereby enhancing semantic consistency between modalities. In addition, a cross-modal sparse transformer fusion (CM-STF) module is developed to guide the network to concentrate on the most relevant regions, where a Top-k selection mechanism is employed to retain discriminative features while filtering out irrelevant content. Extensive experiments on multimodal remote sensing datasets demonstrate that CM2-STNet achieves accurate and robust matching performance, validating its effectiveness and generalization ability in complex real-world scenarios. Code and pre-trained model are available at https://github.com/whuzzzz/CM2-STNet.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.