Xinyu Liu , Rui Ming , Songlin Du , Lianghua He , Haibo Luo , Guobao Xiao
{"title":"HSENet:用于多模态图像融合的分层语义丰富网络","authors":"Xinyu Liu , Rui Ming , Songlin Du , Lianghua He , Haibo Luo , Guobao Xiao","doi":"10.1016/j.patcog.2025.112043","DOIUrl":null,"url":null,"abstract":"<div><div>In this paper, we propose HSENet, a hierarchical semantic-enriched network capable of generating high-quality fused images with robust global semantic consistency and excellent local detail representation. The core innovation of HSENet lies in its hierarchical enrichment of semantic information through semantic gathering, distribution, and injection. Specifically, the network begins by balancing global information exchange via multi-scale feature aggregation and redistribution while dynamically bridging fusion and segmentation tasks. Following this, a progressive semantic dense injection strategy is introduced, employing dense connections to first inject global semantics into highly consistent infrared features and then propagate the semantic-infrared hybrid features to visible features. This approach effectively enhances semantic representation while minimizing high-frequency information loss. Furthermore, HSENet includes two types of feature fusion modules, to leverage cross-modal attention for more comprehensive feature fusion and utilize semantic features as a third input to further enhance the semantic representation for image fusion. These modules achieve robust and flexible feature fusion in complex scenarios by dynamically balancing global semantic consistency and fine-grained local detail representation. Our approach excels in visual perception tasks while fully preserving the texture features from the source modalities. The comparison experiments of image fusion and semantic segmentation demonstrate the superiority of HSENet in visual quality and semantic preservation. The code is available at <span><span>https://github.com/Lxyklmyt/HSENet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112043"},"PeriodicalIF":7.6000,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HSENet: Hierarchical semantic-enriched network for multi-modal image fusion\",\"authors\":\"Xinyu Liu , Rui Ming , Songlin Du , Lianghua He , Haibo Luo , Guobao Xiao\",\"doi\":\"10.1016/j.patcog.2025.112043\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In this paper, we propose HSENet, a hierarchical semantic-enriched network capable of generating high-quality fused images with robust global semantic consistency and excellent local detail representation. The core innovation of HSENet lies in its hierarchical enrichment of semantic information through semantic gathering, distribution, and injection. Specifically, the network begins by balancing global information exchange via multi-scale feature aggregation and redistribution while dynamically bridging fusion and segmentation tasks. Following this, a progressive semantic dense injection strategy is introduced, employing dense connections to first inject global semantics into highly consistent infrared features and then propagate the semantic-infrared hybrid features to visible features. This approach effectively enhances semantic representation while minimizing high-frequency information loss. Furthermore, HSENet includes two types of feature fusion modules, to leverage cross-modal attention for more comprehensive feature fusion and utilize semantic features as a third input to further enhance the semantic representation for image fusion. These modules achieve robust and flexible feature fusion in complex scenarios by dynamically balancing global semantic consistency and fine-grained local detail representation. Our approach excels in visual perception tasks while fully preserving the texture features from the source modalities. The comparison experiments of image fusion and semantic segmentation demonstrate the superiority of HSENet in visual quality and semantic preservation. The code is available at <span><span>https://github.com/Lxyklmyt/HSENet</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"170 \",\"pages\":\"Article 112043\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-07-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325007034\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325007034","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
HSENet: Hierarchical semantic-enriched network for multi-modal image fusion
In this paper, we propose HSENet, a hierarchical semantic-enriched network capable of generating high-quality fused images with robust global semantic consistency and excellent local detail representation. The core innovation of HSENet lies in its hierarchical enrichment of semantic information through semantic gathering, distribution, and injection. Specifically, the network begins by balancing global information exchange via multi-scale feature aggregation and redistribution while dynamically bridging fusion and segmentation tasks. Following this, a progressive semantic dense injection strategy is introduced, employing dense connections to first inject global semantics into highly consistent infrared features and then propagate the semantic-infrared hybrid features to visible features. This approach effectively enhances semantic representation while minimizing high-frequency information loss. Furthermore, HSENet includes two types of feature fusion modules, to leverage cross-modal attention for more comprehensive feature fusion and utilize semantic features as a third input to further enhance the semantic representation for image fusion. These modules achieve robust and flexible feature fusion in complex scenarios by dynamically balancing global semantic consistency and fine-grained local detail representation. Our approach excels in visual perception tasks while fully preserving the texture features from the source modalities. The comparison experiments of image fusion and semantic segmentation demonstrate the superiority of HSENet in visual quality and semantic preservation. The code is available at https://github.com/Lxyklmyt/HSENet.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.