GrFormer：一种新型的用于红外和可见光图像融合的格拉斯曼歧管变压器

IF 15.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2025-06-18 DOI:10.1016/j.inffus.2025.103402

Huan Kang , Hui Li , Xiao-Jun Wu , Tianyang Xu , Rui Wang , Chunyang Cheng , Josef Kittler

{"title":"GrFormer：一种新型的用于红外和可见光图像融合的格拉斯曼歧管变压器","authors":"Huan Kang , Hui Li , Xiao-Jun Wu , Tianyang Xu , Rui Wang , Chunyang Cheng , Josef Kittler","doi":"10.1016/j.inffus.2025.103402","DOIUrl":null,"url":null,"abstract":"<div><div>In the field of image fusion, promising progress has been made by modeling data from different modalities as linear subspaces. However, in practice, the source images are often located in a non-Euclidean space, where the Euclidean methods usually cannot encapsulate the intrinsic topological structure. Typically, the inner product performed in the Euclidean space calculates the algebraic similarity rather than the semantic similarity, which results in undesired attention output and a decrease in fusion performance. While the balance of low-level details and high-level semantics should be considered in infrared and visible image fusion task. To address this issue, in this paper, we propose a novel attention mechanism based on Grassmann manifold for infrared and visible image fusion (<strong><em>GrFormer</em></strong>). Specifically, our method constructs a low-rank subspace mapping through projection constraints on the Grassmann manifold, compressing attention features into subspaces of varying rank levels. This forces the features to decouple into high-frequency details (local low-rank) and low-frequency semantics (global low-rank), thereby achieving multi-scale semantic fusion. Additionally, to effectively integrate the significant information, we develop a cross-modal fusion strategy (CMS) based on a covariance mask to maximize the complementary properties between different modalities and to suppress the features with high correlation, which are deemed redundant. The experimental results demonstrate that our network outperforms SOTA methods both qualitatively and quantitatively on multiple image fusion benchmarks. The codes will be available soon.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"125 ","pages":"Article 103402"},"PeriodicalIF":15.5000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GrFormer: A novel Transformer on Grassmann manifold for infrared and visible image fusion\",\"authors\":\"Huan Kang , Hui Li , Xiao-Jun Wu , Tianyang Xu , Rui Wang , Chunyang Cheng , Josef Kittler\",\"doi\":\"10.1016/j.inffus.2025.103402\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the field of image fusion, promising progress has been made by modeling data from different modalities as linear subspaces. However, in practice, the source images are often located in a non-Euclidean space, where the Euclidean methods usually cannot encapsulate the intrinsic topological structure. Typically, the inner product performed in the Euclidean space calculates the algebraic similarity rather than the semantic similarity, which results in undesired attention output and a decrease in fusion performance. While the balance of low-level details and high-level semantics should be considered in infrared and visible image fusion task. To address this issue, in this paper, we propose a novel attention mechanism based on Grassmann manifold for infrared and visible image fusion (<strong><em>GrFormer</em></strong>). Specifically, our method constructs a low-rank subspace mapping through projection constraints on the Grassmann manifold, compressing attention features into subspaces of varying rank levels. This forces the features to decouple into high-frequency details (local low-rank) and low-frequency semantics (global low-rank), thereby achieving multi-scale semantic fusion. Additionally, to effectively integrate the significant information, we develop a cross-modal fusion strategy (CMS) based on a covariance mask to maximize the complementary properties between different modalities and to suppress the features with high correlation, which are deemed redundant. The experimental results demonstrate that our network outperforms SOTA methods both qualitatively and quantitatively on multiple image fusion benchmarks. The codes will be available soon.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"125 \",\"pages\":\"Article 103402\"},\"PeriodicalIF\":15.5000,\"publicationDate\":\"2025-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525004750\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525004750","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在图像融合领域，将不同模态的数据建模为线性子空间已经取得了可喜的进展。然而，在实际应用中，源图像往往位于非欧几里得空间中，欧几里得方法通常无法封装其固有的拓扑结构。通常，在欧几里得空间中进行的内积计算的是代数相似度，而不是语义相似度，这导致了不期望的注意力输出和融合性能的下降。而在红外与可见光图像融合任务中，需要考虑到底层细节和高层语义的平衡。为了解决这一问题，本文提出了一种新的基于Grassmann流形的红外与可见光图像融合注意机制（GrFormer）。具体来说，我们的方法通过在Grassmann流形上的投影约束构造一个低秩子空间映射，将注意力特征压缩到不同秩水平的子空间中。这迫使特征解耦为高频细节（局部低秩）和低频语义（全局低秩），从而实现多尺度语义融合。此外，为了有效地整合重要信息，我们开发了一种基于协方差掩模的跨模态融合策略（CMS），以最大限度地提高不同模态之间的互补特性，并抑制高相关性的冗余特征。实验结果表明，在多个图像融合基准测试中，我们的网络在定性和定量上都优于SOTA方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GrFormer: A novel Transformer on Grassmann manifold for infrared and visible image fusion

In the field of image fusion, promising progress has been made by modeling data from different modalities as linear subspaces. However, in practice, the source images are often located in a non-Euclidean space, where the Euclidean methods usually cannot encapsulate the intrinsic topological structure. Typically, the inner product performed in the Euclidean space calculates the algebraic similarity rather than the semantic similarity, which results in undesired attention output and a decrease in fusion performance. While the balance of low-level details and high-level semantics should be considered in infrared and visible image fusion task. To address this issue, in this paper, we propose a novel attention mechanism based on Grassmann manifold for infrared and visible image fusion (GrFormer). Specifically, our method constructs a low-rank subspace mapping through projection constraints on the Grassmann manifold, compressing attention features into subspaces of varying rank levels. This forces the features to decouple into high-frequency details (local low-rank) and low-frequency semantics (global low-rank), thereby achieving multi-scale semantic fusion. Additionally, to effectively integrate the significant information, we develop a cross-modal fusion strategy (CMS) based on a covariance mask to maximize the complementary properties between different modalities and to suppress the features with high correlation, which are deemed redundant. The experimental results demonstrate that our network outperforms SOTA methods both qualitatively and quantitatively on multiple image fusion benchmarks. The codes will be available soon.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.