多模态图像配准的模态不变渐进表示

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2024-12-31 DOI:10.1016/j.inffus.2024.102903

Jiangang Ding, Yuanlin Zhao, Lili Pei, Yihui Shan, Yiquan Du, Wei Li

{"title":"多模态图像配准的模态不变渐进表示","authors":"Jiangang Ding, Yuanlin Zhao, Lili Pei, Yihui Shan, Yiquan Du, Wei Li","doi":"10.1016/j.inffus.2024.102903","DOIUrl":null,"url":null,"abstract":"Many applications, such as autonomous driving, rely heavily on multimodal data. However, differences in resolution, viewing angle, and optical path structure cause pixel misalignment between multimodal images, leading to distortions in the fusion result and edge artifacts. In addition to the widely used manual calibration, learning-based methods typically employ a two-stage registration process, referred to as “translating-then-registering”. However, the gap between modalities makes this approach less cohesive. It introduces more uncertainty during registration, misleading feature alignment at different locations and limiting the accuracy of the deformation field. To tackle these challenges, we introduce the Modality-Invariant Progressive Representation (MIPR) approach. The key behind MIPR is to decouple features from different modalities into a modality-invariant domain based on frequency bands, followed by a progressive correction at multiple feature scales. Specifically, MIPR consists two main components: the Field Adaptive Fusion (FAF) module and the Progressive Field Estimation (PFE) module. FAF integrates all previous multi-scale deformation subfields. PFE progressively estimates the remaining deformation subfields at different scales. Furthermore, we propose a two-stage pretraining strategy for end-to-end registration. Our approach is simple and robust, achieving impressive visual results in several benchmark tasks, even surpassing the ground truth from manual calibration, and advancing downstream tasks.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"2 1","pages":""},"PeriodicalIF":14.7000,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Modal-invariant progressive representation for multimodal image registration\",\"authors\":\"Jiangang Ding, Yuanlin Zhao, Lili Pei, Yihui Shan, Yiquan Du, Wei Li\",\"doi\":\"10.1016/j.inffus.2024.102903\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many applications, such as autonomous driving, rely heavily on multimodal data. However, differences in resolution, viewing angle, and optical path structure cause pixel misalignment between multimodal images, leading to distortions in the fusion result and edge artifacts. In addition to the widely used manual calibration, learning-based methods typically employ a two-stage registration process, referred to as “translating-then-registering”. However, the gap between modalities makes this approach less cohesive. It introduces more uncertainty during registration, misleading feature alignment at different locations and limiting the accuracy of the deformation field. To tackle these challenges, we introduce the Modality-Invariant Progressive Representation (MIPR) approach. The key behind MIPR is to decouple features from different modalities into a modality-invariant domain based on frequency bands, followed by a progressive correction at multiple feature scales. Specifically, MIPR consists two main components: the Field Adaptive Fusion (FAF) module and the Progressive Field Estimation (PFE) module. FAF integrates all previous multi-scale deformation subfields. PFE progressively estimates the remaining deformation subfields at different scales. Furthermore, we propose a two-stage pretraining strategy for end-to-end registration. Our approach is simple and robust, achieving impressive visual results in several benchmark tasks, even surpassing the ground truth from manual calibration, and advancing downstream tasks.\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"2 1\",\"pages\":\"\"},\"PeriodicalIF\":14.7000,\"publicationDate\":\"2024-12-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1016/j.inffus.2024.102903\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1016/j.inffus.2024.102903","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

许多应用，如自动驾驶，严重依赖于多模态数据。然而，由于分辨率、视角和光路结构的差异，导致多模态图像之间像素不对齐，从而导致融合结果失真和边缘伪影。除了广泛使用的手动校准之外，基于学习的方法通常采用两阶段注册过程，称为“翻译-然后注册”。然而，模式之间的差距使这种方法缺乏凝聚力。它在配准过程中引入了更多的不确定性，误导了不同位置的特征对齐，限制了变形场的精度。为了解决这些挑战，我们引入了模态不变渐进表示（MIPR）方法。MIPR背后的关键是将不同模态的特征解耦到基于频带的模态不变域，然后在多个特征尺度上进行渐进校正。具体来说，MIPR包括两个主要部分：场自适应融合（FAF）模块和渐进场估计（PFE）模块。FAF集成了所有以前的多尺度变形子场。PFE在不同尺度上逐步估计剩余的变形子场。此外，我们提出了端到端配准的两阶段预训练策略。我们的方法简单而稳健，在几个基准任务中获得了令人印象深刻的视觉结果，甚至超过了手动校准的基础事实，并推进了下游任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Modal-invariant progressive representation for multimodal image registration

Many applications, such as autonomous driving, rely heavily on multimodal data. However, differences in resolution, viewing angle, and optical path structure cause pixel misalignment between multimodal images, leading to distortions in the fusion result and edge artifacts. In addition to the widely used manual calibration, learning-based methods typically employ a two-stage registration process, referred to as “translating-then-registering”. However, the gap between modalities makes this approach less cohesive. It introduces more uncertainty during registration, misleading feature alignment at different locations and limiting the accuracy of the deformation field. To tackle these challenges, we introduce the Modality-Invariant Progressive Representation (MIPR) approach. The key behind MIPR is to decouple features from different modalities into a modality-invariant domain based on frequency bands, followed by a progressive correction at multiple feature scales. Specifically, MIPR consists two main components: the Field Adaptive Fusion (FAF) module and the Progressive Field Estimation (PFE) module. FAF integrates all previous multi-scale deformation subfields. PFE progressively estimates the remaining deformation subfields at different scales. Furthermore, we propose a two-stage pretraining strategy for end-to-end registration. Our approach is simple and robust, achieving impressive visual results in several benchmark tasks, even surpassing the ground truth from manual calibration, and advancing downstream tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.