{"title":"EDFusion: Edge-guided attention and dynamic receptive field with dense residual for multi-focus image fusion","authors":"Hao Zhai, Zhendong Xu, Zhi Zeng, Lei Yu, Bo Lin","doi":"10.1016/j.imavis.2025.105763","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-focus image fusion (MFIF) synthesizes a fully focused image by integrating multiple partially focused images captured at distinct focal planes of the same scene. However, existing methods often fall short in preserving edge and texture details. To address this issue, this paper proposes a network for multi-focus image fusion that incorporates edge-guided attention and dynamic receptive field dense residuals. The network employs a specially designed dynamic receptive field dense residual block (DRF-DRB) to achieve adaptive multi-scale feature extraction, providing rich contextual information for subsequent fine fusion. Building on this, an edge-guided fusion module (EGFM) explicitly leverages the differences in source images as edge priors to generate dedicated weight maps for each feature channel, enabling precise boundary preservation. To efficiently model global dependencies, we introduce a multi-scale token mixing transformer (MSTM-Transformer), designed to reduce computational complexity while enhancing cross-scale semantic interactions. Finally, a refined multi-scale context upsampling module (MSCU) reconstructs high-frequency details. Experiments were conducted on five public datasets, comparing against twelve state-of-the-art methods and evaluated using nine metrics. Both quantitative and qualitative results demonstrate that the proposed method significantly outperforms existing approaches in fusion performance. Notably, on the Lytro dataset, the proposed method ranked first across eight core metrics, achieving high scores of 1.1946 in the information preservation metric (<span><math><msub><mrow><mi>Q</mi></mrow><mrow><mi>N</mi><mi>M</mi><mi>I</mi></mrow></msub></math></span>) and 0.7629 in the edge information fidelity metric (<span><math><msub><mrow><mi>Q</mi></mrow><mrow><mi>A</mi><mi>B</mi><mo>/</mo><mi>F</mi></mrow></msub></math></span>).</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"163 ","pages":"Article 105763"},"PeriodicalIF":4.2000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625003518","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Multi-focus image fusion (MFIF) synthesizes a fully focused image by integrating multiple partially focused images captured at distinct focal planes of the same scene. However, existing methods often fall short in preserving edge and texture details. To address this issue, this paper proposes a network for multi-focus image fusion that incorporates edge-guided attention and dynamic receptive field dense residuals. The network employs a specially designed dynamic receptive field dense residual block (DRF-DRB) to achieve adaptive multi-scale feature extraction, providing rich contextual information for subsequent fine fusion. Building on this, an edge-guided fusion module (EGFM) explicitly leverages the differences in source images as edge priors to generate dedicated weight maps for each feature channel, enabling precise boundary preservation. To efficiently model global dependencies, we introduce a multi-scale token mixing transformer (MSTM-Transformer), designed to reduce computational complexity while enhancing cross-scale semantic interactions. Finally, a refined multi-scale context upsampling module (MSCU) reconstructs high-frequency details. Experiments were conducted on five public datasets, comparing against twelve state-of-the-art methods and evaluated using nine metrics. Both quantitative and qualitative results demonstrate that the proposed method significantly outperforms existing approaches in fusion performance. Notably, on the Lytro dataset, the proposed method ranked first across eight core metrics, achieving high scores of 1.1946 in the information preservation metric () and 0.7629 in the edge information fidelity metric ().
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.