Jiachen He , Xiaoqing Luo , Zhancheng Zhang , Xiao-jun Wu
{"title":"MemoryFusion: A novel architecture for infrared and visible image fusion based on memory unit","authors":"Jiachen He , Xiaoqing Luo , Zhancheng Zhang , Xiao-jun Wu","doi":"10.1016/j.patcog.2025.112004","DOIUrl":null,"url":null,"abstract":"<div><div>Existing image fusion methods utilize elaborate encoders to sequentially extract shallow and deep features from the source images. However, most methods lack long-term dependence, i.e. shallow details are inevitably lost when the network encodes deep features. To this end, some methods employ skip connections or dense connections to directly assign shallow features into deeper layers, potentially introducing redundant information and increasing computational loads. To overcome these drawbacks and enhance the generalization ability for low-quality scenarios, a novel fusion architecture based on Gated Recurrent Unit (GRU) termed as MemoryFusion is proposed. First, the Input Extension Encoder (IEE) transfers the source image into a feature sequence. Then a Recurrent Fusion Encoder (RFE) containing Recurrent Memory Fusion Unit (RMFU) is designed to learn the intrinsic correlation between the multi-modality feature sequences and generate the fusion feature sequence. This memory fusion unit utilizes a special gating mechanism to incorporate historical information and current input, and then adaptively selects the valuable content and forgets the redundant information. More importantly, it effectively relieves the computational pressure. Finally, since the modality information is distributed at different sequence depths and varying illumination intensity, the Multi-hierarchical Aggregation Module (MHAM) is designed to obtain the corresponding weight sequence. The aggregated fusion feature is obtained by integrating the fusion feature sequence with the weight sequence. Extensive experiments demonstrate that MemoryFusion is superior to the state-of-the-art fusion methods on multiple datasets. Even on low-quality images, such as low-light or foggy conditions, our method also demonstrates exceptional fusion performance and scene fidelity.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112004"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325006648","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Existing image fusion methods utilize elaborate encoders to sequentially extract shallow and deep features from the source images. However, most methods lack long-term dependence, i.e. shallow details are inevitably lost when the network encodes deep features. To this end, some methods employ skip connections or dense connections to directly assign shallow features into deeper layers, potentially introducing redundant information and increasing computational loads. To overcome these drawbacks and enhance the generalization ability for low-quality scenarios, a novel fusion architecture based on Gated Recurrent Unit (GRU) termed as MemoryFusion is proposed. First, the Input Extension Encoder (IEE) transfers the source image into a feature sequence. Then a Recurrent Fusion Encoder (RFE) containing Recurrent Memory Fusion Unit (RMFU) is designed to learn the intrinsic correlation between the multi-modality feature sequences and generate the fusion feature sequence. This memory fusion unit utilizes a special gating mechanism to incorporate historical information and current input, and then adaptively selects the valuable content and forgets the redundant information. More importantly, it effectively relieves the computational pressure. Finally, since the modality information is distributed at different sequence depths and varying illumination intensity, the Multi-hierarchical Aggregation Module (MHAM) is designed to obtain the corresponding weight sequence. The aggregated fusion feature is obtained by integrating the fusion feature sequence with the weight sequence. Extensive experiments demonstrate that MemoryFusion is superior to the state-of-the-art fusion methods on multiple datasets. Even on low-quality images, such as low-light or foggy conditions, our method also demonstrates exceptional fusion performance and scene fidelity.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.