Zhenghua Huang , Biyun Xu , Menghan Xia , Qian Li , Lianying Zou , Shaoyi Li , Xi Li
{"title":"MSCS: Multi-stage feature learning with channel-spatial attention mechanism for infrared and visible image fusion","authors":"Zhenghua Huang , Biyun Xu , Menghan Xia , Qian Li , Lianying Zou , Shaoyi Li , Xi Li","doi":"10.1016/j.infrared.2024.105514","DOIUrl":null,"url":null,"abstract":"<div><p>The intention of infrared and visible image fusion is to combine the images captured by different modal sensors in the same scene to enhance its understanding. Deep learning has been proven its powerful application in image fusion due to its fine generalization, robustness, and representability of deep features. However, the performance of these deep learning-based methods heavily depends on the illumination condition. Especially in dark or exposed scenes, the fused results are over-smoothness and low-contrast, resulting in inaccuracy of object detection. To address these issues, this paper develops a multi-stage feature learning approach with channel-spatial attention mechanism, namely MSCS, for infrared and visible image fusion. The MSCS is composed of the following four key procedures: Firstly, the infrared and visible images are decomposed into illumination and reflectance components by a proposed network called as Retinex_Net. Then, the components are transported to an encoder for features coding. Next, we propose an adaptive fusion module with attention mechanisms to fuse the features. Finally, the fused image is generated by the decoder for decoding the fused features. Meanwhile, a novel fusion loss function and a multi-stage training strategy are proposed to train the above modules. The subjective and objective results of experiments on <em>TNO</em>, <em>LLVIP</em> and <em>MSRS</em> datasets illustrate that the proposed method is effective and performs better than the state-of-the-art fusion methods on achieving enjoyable results in dark or over-exposure scenes. And the results of further experiments on the fused images for object detection demonstrate that the fusion outputs produced by our MSCS are more beneficial for detection tasks.</p></div>","PeriodicalId":13549,"journal":{"name":"Infrared Physics & Technology","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infrared Physics & Technology","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1350449524003980","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INSTRUMENTS & INSTRUMENTATION","Score":null,"Total":0}
引用次数: 0
Abstract
The intention of infrared and visible image fusion is to combine the images captured by different modal sensors in the same scene to enhance its understanding. Deep learning has been proven its powerful application in image fusion due to its fine generalization, robustness, and representability of deep features. However, the performance of these deep learning-based methods heavily depends on the illumination condition. Especially in dark or exposed scenes, the fused results are over-smoothness and low-contrast, resulting in inaccuracy of object detection. To address these issues, this paper develops a multi-stage feature learning approach with channel-spatial attention mechanism, namely MSCS, for infrared and visible image fusion. The MSCS is composed of the following four key procedures: Firstly, the infrared and visible images are decomposed into illumination and reflectance components by a proposed network called as Retinex_Net. Then, the components are transported to an encoder for features coding. Next, we propose an adaptive fusion module with attention mechanisms to fuse the features. Finally, the fused image is generated by the decoder for decoding the fused features. Meanwhile, a novel fusion loss function and a multi-stage training strategy are proposed to train the above modules. The subjective and objective results of experiments on TNO, LLVIP and MSRS datasets illustrate that the proposed method is effective and performs better than the state-of-the-art fusion methods on achieving enjoyable results in dark or over-exposure scenes. And the results of further experiments on the fused images for object detection demonstrate that the fusion outputs produced by our MSCS are more beneficial for detection tasks.
期刊介绍:
The Journal covers the entire field of infrared physics and technology: theory, experiment, application, devices and instrumentation. Infrared'' is defined as covering the near, mid and far infrared (terahertz) regions from 0.75um (750nm) to 1mm (300GHz.) Submissions in the 300GHz to 100GHz region may be accepted at the editors discretion if their content is relevant to shorter wavelengths. Submissions must be primarily concerned with and directly relevant to this spectral region.
Its core topics can be summarized as the generation, propagation and detection, of infrared radiation; the associated optics, materials and devices; and its use in all fields of science, industry, engineering and medicine.
Infrared techniques occur in many different fields, notably spectroscopy and interferometry; material characterization and processing; atmospheric physics, astronomy and space research. Scientific aspects include lasers, quantum optics, quantum electronics, image processing and semiconductor physics. Some important applications are medical diagnostics and treatment, industrial inspection and environmental monitoring.