HSENet：用于多模态图像融合的分层语义丰富网络

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2025-07-05 DOI:10.1016/j.patcog.2025.112043

Xinyu Liu , Rui Ming , Songlin Du , Lianghua He , Haibo Luo , Guobao Xiao

{"title":"HSENet：用于多模态图像融合的分层语义丰富网络","authors":"Xinyu Liu , Rui Ming , Songlin Du , Lianghua He , Haibo Luo , Guobao Xiao","doi":"10.1016/j.patcog.2025.112043","DOIUrl":null,"url":null,"abstract":"<div><div>In this paper, we propose HSENet, a hierarchical semantic-enriched network capable of generating high-quality fused images with robust global semantic consistency and excellent local detail representation. The core innovation of HSENet lies in its hierarchical enrichment of semantic information through semantic gathering, distribution, and injection. Specifically, the network begins by balancing global information exchange via multi-scale feature aggregation and redistribution while dynamically bridging fusion and segmentation tasks. Following this, a progressive semantic dense injection strategy is introduced, employing dense connections to first inject global semantics into highly consistent infrared features and then propagate the semantic-infrared hybrid features to visible features. This approach effectively enhances semantic representation while minimizing high-frequency information loss. Furthermore, HSENet includes two types of feature fusion modules, to leverage cross-modal attention for more comprehensive feature fusion and utilize semantic features as a third input to further enhance the semantic representation for image fusion. These modules achieve robust and flexible feature fusion in complex scenarios by dynamically balancing global semantic consistency and fine-grained local detail representation. Our approach excels in visual perception tasks while fully preserving the texture features from the source modalities. The comparison experiments of image fusion and semantic segmentation demonstrate the superiority of HSENet in visual quality and semantic preservation. The code is available at <span><span>https://github.com/Lxyklmyt/HSENet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112043"},"PeriodicalIF":7.6000,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HSENet: Hierarchical semantic-enriched network for multi-modal image fusion\",\"authors\":\"Xinyu Liu , Rui Ming , Songlin Du , Lianghua He , Haibo Luo , Guobao Xiao\",\"doi\":\"10.1016/j.patcog.2025.112043\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In this paper, we propose HSENet, a hierarchical semantic-enriched network capable of generating high-quality fused images with robust global semantic consistency and excellent local detail representation. The core innovation of HSENet lies in its hierarchical enrichment of semantic information through semantic gathering, distribution, and injection. Specifically, the network begins by balancing global information exchange via multi-scale feature aggregation and redistribution while dynamically bridging fusion and segmentation tasks. Following this, a progressive semantic dense injection strategy is introduced, employing dense connections to first inject global semantics into highly consistent infrared features and then propagate the semantic-infrared hybrid features to visible features. This approach effectively enhances semantic representation while minimizing high-frequency information loss. Furthermore, HSENet includes two types of feature fusion modules, to leverage cross-modal attention for more comprehensive feature fusion and utilize semantic features as a third input to further enhance the semantic representation for image fusion. These modules achieve robust and flexible feature fusion in complex scenarios by dynamically balancing global semantic consistency and fine-grained local detail representation. Our approach excels in visual perception tasks while fully preserving the texture features from the source modalities. The comparison experiments of image fusion and semantic segmentation demonstrate the superiority of HSENet in visual quality and semantic preservation. The code is available at <span><span>https://github.com/Lxyklmyt/HSENet</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"170 \",\"pages\":\"Article 112043\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-07-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325007034\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325007034","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在本文中，我们提出了HSENet，这是一种分层语义丰富的网络，能够生成具有鲁棒全局语义一致性和出色局部细节表示的高质量融合图像。HSENet的核心创新在于通过语义采集、分布和注入对语义信息进行分层丰富。具体来说，网络首先通过多尺度特征聚合和再分配平衡全局信息交换，同时动态桥接融合和分割任务。在此基础上，提出了一种渐进式语义密集注入策略，利用密集连接将全局语义注入到高度一致的红外特征中，然后将语义-红外混合特征传播到可见特征中。该方法有效地增强了语义表示，同时最大限度地减少了高频信息的丢失。此外，HSENet包括两种类型的特征融合模块，利用跨模态注意力进行更全面的特征融合，并利用语义特征作为第三输入进一步增强图像融合的语义表示。这些模块通过动态平衡全局语义一致性和细粒度局部细节表示，实现了复杂场景下鲁棒灵活的特征融合。我们的方法在充分保留源模态纹理特征的同时，在视觉感知任务中表现优异。通过图像融合和语义分割的对比实验，验证了HSENet在视觉质量和语义保持方面的优势。代码可在https://github.com/Lxyklmyt/HSENet上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

HSENet: Hierarchical semantic-enriched network for multi-modal image fusion

In this paper, we propose HSENet, a hierarchical semantic-enriched network capable of generating high-quality fused images with robust global semantic consistency and excellent local detail representation. The core innovation of HSENet lies in its hierarchical enrichment of semantic information through semantic gathering, distribution, and injection. Specifically, the network begins by balancing global information exchange via multi-scale feature aggregation and redistribution while dynamically bridging fusion and segmentation tasks. Following this, a progressive semantic dense injection strategy is introduced, employing dense connections to first inject global semantics into highly consistent infrared features and then propagate the semantic-infrared hybrid features to visible features. This approach effectively enhances semantic representation while minimizing high-frequency information loss. Furthermore, HSENet includes two types of feature fusion modules, to leverage cross-modal attention for more comprehensive feature fusion and utilize semantic features as a third input to further enhance the semantic representation for image fusion. These modules achieve robust and flexible feature fusion in complex scenarios by dynamically balancing global semantic consistency and fine-grained local detail representation. Our approach excels in visual perception tasks while fully preserving the texture features from the source modalities. The comparison experiments of image fusion and semantic segmentation demonstrate the superiority of HSENet in visual quality and semantic preservation. The code is available at https://github.com/Lxyklmyt/HSENet.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.