Registration-aware cross-modal interaction network for optical and SAR images

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2025-05-13 DOI:10.1016/j.inffus.2025.103278

Zhong Chen , Xiaolei Zhang , Xueru Xu , Hanruo Chen , Xiaofei Mi , Jian Yang

{"title":"Registration-aware cross-modal interaction network for optical and SAR images","authors":"Zhong Chen , Xiaolei Zhang , Xueru Xu , Hanruo Chen , Xiaofei Mi , Jian Yang","doi":"10.1016/j.inffus.2025.103278","DOIUrl":null,"url":null,"abstract":"<div><div>The registration of optical and synthetic aperture radar (SAR) images is valuable for exploration due to the inherent complementarity of optical and SAR imagery. However, the substantial radiation and geometric differences between the two modalities present a major obstacle to image registration. Specifically, images from optical and SAR require integration of precise local features and registration-aware global features, and features within and across modalities need to be interacted with efficiently to achieve accurate registration. To tackle this problem, we build a Robust Quadratic Net (RQ-Net) based on the paradigm of describe-then-detect, which is of dual-encoder–decoder design, the first encoder is responsible for encoding local features within each modality through vanilla convolutional operators, while the other is an elaborated Multilayer Cross-modal Registration-aware (MCR) encoder specialized in building global relationships both inner- and inter-modalities, which is conducted effectively at various scales to extract informative features for registration. Furthermore, to cooperate with the network’s training for more well-suited registration feature descriptors, we propose a reconsider loss to review whether the least similar positive feature pairs are matchable and make the RQ-Net achieve a higher matching capability. Through extensive qualitative and quantitative experiments on three paired optical and SAR datasets, RQ-Net has been validated as superior in extracting sufficient features for matching and improving image success registration rates while maintaining low registration errors.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103278"},"PeriodicalIF":14.7000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525003513","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The registration of optical and synthetic aperture radar (SAR) images is valuable for exploration due to the inherent complementarity of optical and SAR imagery. However, the substantial radiation and geometric differences between the two modalities present a major obstacle to image registration. Specifically, images from optical and SAR require integration of precise local features and registration-aware global features, and features within and across modalities need to be interacted with efficiently to achieve accurate registration. To tackle this problem, we build a Robust Quadratic Net (RQ-Net) based on the paradigm of describe-then-detect, which is of dual-encoder–decoder design, the first encoder is responsible for encoding local features within each modality through vanilla convolutional operators, while the other is an elaborated Multilayer Cross-modal Registration-aware (MCR) encoder specialized in building global relationships both inner- and inter-modalities, which is conducted effectively at various scales to extract informative features for registration. Furthermore, to cooperate with the network’s training for more well-suited registration feature descriptors, we propose a reconsider loss to review whether the least similar positive feature pairs are matchable and make the RQ-Net achieve a higher matching capability. Through extensive qualitative and quantitative experiments on three paired optical and SAR datasets, RQ-Net has been validated as superior in extracting sufficient features for matching and improving image success registration rates while maintaining low registration errors.

查看原文本刊更多论文

光学和SAR图像的配准感知跨模态交互网络

光学和合成孔径雷达（SAR）图像的配准具有很强的互补性，对勘探具有重要意义。然而，两种模式之间的辐射和几何差异是图像配准的主要障碍。具体来说，来自光学和SAR的图像需要集成精确的局部特征和配准感知的全局特征，并且需要有效地与模态内部和跨模态的特征交互以实现准确的配准。为了解决这个问题，我们基于描述-检测的范式构建了一个鲁棒二次网络（RQ-Net），这是一个双编码器-解码器设计，第一个编码器负责通过普通卷积算子编码每个模态中的局部特征，而另一个是一个精心设计的多层跨模态注册感知（MCR）编码器，专门构建内部和内部模态之间的全局关系。在不同尺度下有效地提取信息特征进行配准。此外，为了配合网络训练更适合的配准特征描述符，我们提出了一种重新考虑损失的方法来检查最小相似的正特征对是否匹配，从而使RQ-Net获得更高的匹配能力。通过对三个配对的光学和SAR数据集进行广泛的定性和定量实验，RQ-Net在提取足够的匹配特征和提高图像成功配准率同时保持低配准误差方面具有优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.