Gengrui Li , Daoyun Tang , Jinhuan Huang , Shaoning Zhu , Jiangtao Cao
{"title":"基于双尺度自适应注意力的视觉转换器,迭代改进多焦点图像融合的清晰度和一致性","authors":"Gengrui Li , Daoyun Tang , Jinhuan Huang , Shaoning Zhu , Jiangtao Cao","doi":"10.1016/j.engappai.2025.112777","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-focus Image Fusion (MFIF) has become a prominent role in combining focused regions of several source images into a single all-in-focus fused image. However, existing approaches have the limitation of maintaining global spatial coherence and sharp details. To overcome these limitations, the Dual-Scale Adaptive Attention-Based Vision Transformer (DAA-ViT) model is proposed, which integrates fine-scale and coarse-scale attention, with the aim of maintaining local high-resolution information along with structural coherence. Additionally, an Iterative Refinement Fusion (IRF) is introduced to refine focus boundaries through multiple iterations for enhancing overall image definition, while mitigating fusion artifacts and focus selection errors. Especially, this Artificial Intelligence (AI)-based approach is efficient in complex scenes with inconsistent depth levels, which is suitable for applications like remote sensing and medical image processing. Experimental results of several benchmark datasets demonstrate that the proposed method attains better results than existing methods with a Mutual Information (MI) of 8.9671, Structural Similarity Index Measure (SSIM) of 0.9211, Peak Signal-To-Noise Ratio (PSNR) of 36.728 dB, and Lower Root Mean Square Error (RMSE) of 1.5482. Compared to the existing Swin Transformer and Convolutional Neural Network (STCU-Net) model, the proposed model attains 2.65 % improvement in PSNR, 1.99 % improvement in MI, 1.11 % improvement in Structural Similarity Index Measure, and 5.13 % reduction in RMSE. These findings demonstrate the efficiency of AI-based fusion strategies in delivering high-quality all-in-focus images and emphasize their applications in medical imaging and remote sensing processing.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"163 ","pages":"Article 112777"},"PeriodicalIF":8.0000,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dual-scale adaptive attention-based Vision transformer with iterative refinement for clarity and consistency in multi-focus image fusion\",\"authors\":\"Gengrui Li , Daoyun Tang , Jinhuan Huang , Shaoning Zhu , Jiangtao Cao\",\"doi\":\"10.1016/j.engappai.2025.112777\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multi-focus Image Fusion (MFIF) has become a prominent role in combining focused regions of several source images into a single all-in-focus fused image. However, existing approaches have the limitation of maintaining global spatial coherence and sharp details. To overcome these limitations, the Dual-Scale Adaptive Attention-Based Vision Transformer (DAA-ViT) model is proposed, which integrates fine-scale and coarse-scale attention, with the aim of maintaining local high-resolution information along with structural coherence. Additionally, an Iterative Refinement Fusion (IRF) is introduced to refine focus boundaries through multiple iterations for enhancing overall image definition, while mitigating fusion artifacts and focus selection errors. Especially, this Artificial Intelligence (AI)-based approach is efficient in complex scenes with inconsistent depth levels, which is suitable for applications like remote sensing and medical image processing. Experimental results of several benchmark datasets demonstrate that the proposed method attains better results than existing methods with a Mutual Information (MI) of 8.9671, Structural Similarity Index Measure (SSIM) of 0.9211, Peak Signal-To-Noise Ratio (PSNR) of 36.728 dB, and Lower Root Mean Square Error (RMSE) of 1.5482. Compared to the existing Swin Transformer and Convolutional Neural Network (STCU-Net) model, the proposed model attains 2.65 % improvement in PSNR, 1.99 % improvement in MI, 1.11 % improvement in Structural Similarity Index Measure, and 5.13 % reduction in RMSE. These findings demonstrate the efficiency of AI-based fusion strategies in delivering high-quality all-in-focus images and emphasize their applications in medical imaging and remote sensing processing.</div></div>\",\"PeriodicalId\":50523,\"journal\":{\"name\":\"Engineering Applications of Artificial Intelligence\",\"volume\":\"163 \",\"pages\":\"Article 112777\"},\"PeriodicalIF\":8.0000,\"publicationDate\":\"2025-10-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering Applications of Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0952197625028088\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625028088","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Dual-scale adaptive attention-based Vision transformer with iterative refinement for clarity and consistency in multi-focus image fusion
Multi-focus Image Fusion (MFIF) has become a prominent role in combining focused regions of several source images into a single all-in-focus fused image. However, existing approaches have the limitation of maintaining global spatial coherence and sharp details. To overcome these limitations, the Dual-Scale Adaptive Attention-Based Vision Transformer (DAA-ViT) model is proposed, which integrates fine-scale and coarse-scale attention, with the aim of maintaining local high-resolution information along with structural coherence. Additionally, an Iterative Refinement Fusion (IRF) is introduced to refine focus boundaries through multiple iterations for enhancing overall image definition, while mitigating fusion artifacts and focus selection errors. Especially, this Artificial Intelligence (AI)-based approach is efficient in complex scenes with inconsistent depth levels, which is suitable for applications like remote sensing and medical image processing. Experimental results of several benchmark datasets demonstrate that the proposed method attains better results than existing methods with a Mutual Information (MI) of 8.9671, Structural Similarity Index Measure (SSIM) of 0.9211, Peak Signal-To-Noise Ratio (PSNR) of 36.728 dB, and Lower Root Mean Square Error (RMSE) of 1.5482. Compared to the existing Swin Transformer and Convolutional Neural Network (STCU-Net) model, the proposed model attains 2.65 % improvement in PSNR, 1.99 % improvement in MI, 1.11 % improvement in Structural Similarity Index Measure, and 5.13 % reduction in RMSE. These findings demonstrate the efficiency of AI-based fusion strategies in delivering high-quality all-in-focus images and emphasize their applications in medical imaging and remote sensing processing.
期刊介绍:
Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.