Jiangang Ding, Yuanlin Zhao, Lili Pei, Yihui Shan, Yiquan Du, Wei Li
{"title":"多模态图像配准的模态不变渐进表示","authors":"Jiangang Ding, Yuanlin Zhao, Lili Pei, Yihui Shan, Yiquan Du, Wei Li","doi":"10.1016/j.inffus.2024.102903","DOIUrl":null,"url":null,"abstract":"Many applications, such as autonomous driving, rely heavily on multimodal data. However, differences in resolution, viewing angle, and optical path structure cause pixel misalignment between multimodal images, leading to distortions in the fusion result and edge artifacts. In addition to the widely used manual calibration, learning-based methods typically employ a two-stage registration process, referred to as “translating-then-registering”. However, the gap between modalities makes this approach less cohesive. It introduces more uncertainty during registration, misleading feature alignment at different locations and limiting the accuracy of the deformation field. To tackle these challenges, we introduce the Modality-Invariant Progressive Representation (MIPR) approach. The key behind MIPR is to decouple features from different modalities into a modality-invariant domain based on frequency bands, followed by a progressive correction at multiple feature scales. Specifically, MIPR consists two main components: the Field Adaptive Fusion (FAF) module and the Progressive Field Estimation (PFE) module. FAF integrates all previous multi-scale deformation subfields. PFE progressively estimates the remaining deformation subfields at different scales. Furthermore, we propose a two-stage pretraining strategy for end-to-end registration. Our approach is simple and robust, achieving impressive visual results in several benchmark tasks, even surpassing the ground truth from manual calibration, and advancing downstream tasks.","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"2 1","pages":""},"PeriodicalIF":14.7000,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Modal-invariant progressive representation for multimodal image registration\",\"authors\":\"Jiangang Ding, Yuanlin Zhao, Lili Pei, Yihui Shan, Yiquan Du, Wei Li\",\"doi\":\"10.1016/j.inffus.2024.102903\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many applications, such as autonomous driving, rely heavily on multimodal data. However, differences in resolution, viewing angle, and optical path structure cause pixel misalignment between multimodal images, leading to distortions in the fusion result and edge artifacts. In addition to the widely used manual calibration, learning-based methods typically employ a two-stage registration process, referred to as “translating-then-registering”. However, the gap between modalities makes this approach less cohesive. It introduces more uncertainty during registration, misleading feature alignment at different locations and limiting the accuracy of the deformation field. To tackle these challenges, we introduce the Modality-Invariant Progressive Representation (MIPR) approach. The key behind MIPR is to decouple features from different modalities into a modality-invariant domain based on frequency bands, followed by a progressive correction at multiple feature scales. Specifically, MIPR consists two main components: the Field Adaptive Fusion (FAF) module and the Progressive Field Estimation (PFE) module. FAF integrates all previous multi-scale deformation subfields. PFE progressively estimates the remaining deformation subfields at different scales. Furthermore, we propose a two-stage pretraining strategy for end-to-end registration. Our approach is simple and robust, achieving impressive visual results in several benchmark tasks, even surpassing the ground truth from manual calibration, and advancing downstream tasks.\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"2 1\",\"pages\":\"\"},\"PeriodicalIF\":14.7000,\"publicationDate\":\"2024-12-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1016/j.inffus.2024.102903\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1016/j.inffus.2024.102903","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Modal-invariant progressive representation for multimodal image registration
Many applications, such as autonomous driving, rely heavily on multimodal data. However, differences in resolution, viewing angle, and optical path structure cause pixel misalignment between multimodal images, leading to distortions in the fusion result and edge artifacts. In addition to the widely used manual calibration, learning-based methods typically employ a two-stage registration process, referred to as “translating-then-registering”. However, the gap between modalities makes this approach less cohesive. It introduces more uncertainty during registration, misleading feature alignment at different locations and limiting the accuracy of the deformation field. To tackle these challenges, we introduce the Modality-Invariant Progressive Representation (MIPR) approach. The key behind MIPR is to decouple features from different modalities into a modality-invariant domain based on frequency bands, followed by a progressive correction at multiple feature scales. Specifically, MIPR consists two main components: the Field Adaptive Fusion (FAF) module and the Progressive Field Estimation (PFE) module. FAF integrates all previous multi-scale deformation subfields. PFE progressively estimates the remaining deformation subfields at different scales. Furthermore, we propose a two-stage pretraining strategy for end-to-end registration. Our approach is simple and robust, achieving impressive visual results in several benchmark tasks, even surpassing the ground truth from manual calibration, and advancing downstream tasks.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.