基于卷积注意视觉变换和空间一致性模型的多焦点图像融合优化

IF 6.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Soft Computing Pub Date : 2025-06-20 DOI:10.1016/j.asoc.2025.113507

Shengchuan Jiang , Shanchuan Yu

{"title":"基于卷积注意视觉变换和空间一致性模型的多焦点图像融合优化","authors":"Shengchuan Jiang , Shanchuan Yu","doi":"10.1016/j.asoc.2025.113507","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-Focus Image Fusion (MFIF) aims to integrate all pixels in the images to avoid defocused pixels simultaneously. The removal of defocused pixels in the image is more challenging because the traditional approach makes it difficult to accurately detect defocused regions in the image. In this paper, the Convolutional Attention based Vision Transformer based Iterative Multi-Scale Fusion Network (CAViT-IMSFN) model is proposed for MFIF. The collected input images are fed into preprocessing approaches like normalization, data imputation, and augmentation for improving the generalization ability of the model and also for reducing overfitting issues. The convolutional-based model named MobileNetV2 is used to extract local features and the AViT model is introduced to extract global features present in the preprocessed images. The spatial attention model is used in the integration stage to integrate both local and global features that preserve spatial consistency. For enhancing image quality and spatial consistency, the iterative refinement model is implemented according to the feedback mechanism that helps to update the fused output iteratively. The gradient boosting optimization algorithm is used for weight adjustment and the multi-scale fusion model is applied for the identification of focused and defocused portions in the images. This proposed model improves network adaptability by capturing multi-scale features and also has the ability to handle various levels of detail and complexity. The experimental evaluation is performed in terms of using diverse performance evaluation measures and quantitative analyses. The proposed model attained performances of 1.435 from Normalized Mutual Information (NMI) and 0.78 s from computational time. The result showed that the proposed model achieved superior outcomes rather than other MFIF-related existing approaches.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"181 ","pages":"Article 113507"},"PeriodicalIF":6.6000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimizing multi-focus image fusion through convolutional attention vision transformers and spatial consistency models\",\"authors\":\"Shengchuan Jiang , Shanchuan Yu\",\"doi\":\"10.1016/j.asoc.2025.113507\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multi-Focus Image Fusion (MFIF) aims to integrate all pixels in the images to avoid defocused pixels simultaneously. The removal of defocused pixels in the image is more challenging because the traditional approach makes it difficult to accurately detect defocused regions in the image. In this paper, the Convolutional Attention based Vision Transformer based Iterative Multi-Scale Fusion Network (CAViT-IMSFN) model is proposed for MFIF. The collected input images are fed into preprocessing approaches like normalization, data imputation, and augmentation for improving the generalization ability of the model and also for reducing overfitting issues. The convolutional-based model named MobileNetV2 is used to extract local features and the AViT model is introduced to extract global features present in the preprocessed images. The spatial attention model is used in the integration stage to integrate both local and global features that preserve spatial consistency. For enhancing image quality and spatial consistency, the iterative refinement model is implemented according to the feedback mechanism that helps to update the fused output iteratively. The gradient boosting optimization algorithm is used for weight adjustment and the multi-scale fusion model is applied for the identification of focused and defocused portions in the images. This proposed model improves network adaptability by capturing multi-scale features and also has the ability to handle various levels of detail and complexity. The experimental evaluation is performed in terms of using diverse performance evaluation measures and quantitative analyses. The proposed model attained performances of 1.435 from Normalized Mutual Information (NMI) and 0.78 s from computational time. The result showed that the proposed model achieved superior outcomes rather than other MFIF-related existing approaches.</div></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":\"181 \",\"pages\":\"Article 113507\"},\"PeriodicalIF\":6.6000,\"publicationDate\":\"2025-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S156849462500818X\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S156849462500818X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

多焦点图像融合（Multi-Focus Image Fusion， MFIF）旨在将图像中的所有像素融合在一起，以避免同时出现散焦像素。由于传统的方法难以准确检测图像中的散焦区域，因此图像中散焦像素的去除更具挑战性。本文提出了一种基于卷积注意力视觉变换的迭代多尺度融合网络（CAViT-IMSFN）模型。收集到的输入图像被输入到预处理方法中，如归一化、数据输入和增强，以提高模型的泛化能力，并减少过拟合问题。使用基于卷积的MobileNetV2模型提取局部特征，并引入AViT模型提取预处理图像中的全局特征。在整合阶段，采用空间注意模型对局部特征和全局特征进行整合，以保持空间一致性。为了提高图像质量和空间一致性，根据反馈机制实现迭代细化模型，迭代更新融合后的输出。采用梯度增强优化算法进行权值调整，采用多尺度融合模型对图像中的聚焦和散焦部分进行识别。该模型通过捕获多尺度特征，提高了网络的适应性，并具有处理不同层次细节和复杂性的能力。采用多种性能评价指标和定量分析进行了实验评价。该模型的归一化互信息（NMI）性能为1.435，计算时间为0.78 s。结果表明，所提出的模型比其他与mfif相关的现有方法取得了更好的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optimizing multi-focus image fusion through convolutional attention vision transformers and spatial consistency models

Multi-Focus Image Fusion (MFIF) aims to integrate all pixels in the images to avoid defocused pixels simultaneously. The removal of defocused pixels in the image is more challenging because the traditional approach makes it difficult to accurately detect defocused regions in the image. In this paper, the Convolutional Attention based Vision Transformer based Iterative Multi-Scale Fusion Network (CAViT-IMSFN) model is proposed for MFIF. The collected input images are fed into preprocessing approaches like normalization, data imputation, and augmentation for improving the generalization ability of the model and also for reducing overfitting issues. The convolutional-based model named MobileNetV2 is used to extract local features and the AViT model is introduced to extract global features present in the preprocessed images. The spatial attention model is used in the integration stage to integrate both local and global features that preserve spatial consistency. For enhancing image quality and spatial consistency, the iterative refinement model is implemented according to the feedback mechanism that helps to update the fused output iteratively. The gradient boosting optimization algorithm is used for weight adjustment and the multi-scale fusion model is applied for the identification of focused and defocused portions in the images. This proposed model improves network adaptability by capturing multi-scale features and also has the ability to handle various levels of detail and complexity. The experimental evaluation is performed in terms of using diverse performance evaluation measures and quantitative analyses. The proposed model attained performances of 1.435 from Normalized Mutual Information (NMI) and 0.78 s from computational time. The result showed that the proposed model achieved superior outcomes rather than other MFIF-related existing approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Soft Computing 工程技术-计算机：跨学科应用

CiteScore

15.80

自引率

6.90%

发文量

874

审稿时长

10.9 months

期刊介绍： Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities. Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.