{"title":"基于卷积注意视觉变换和空间一致性模型的多焦点图像融合优化","authors":"Shengchuan Jiang , Shanchuan Yu","doi":"10.1016/j.asoc.2025.113507","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-Focus Image Fusion (MFIF) aims to integrate all pixels in the images to avoid defocused pixels simultaneously. The removal of defocused pixels in the image is more challenging because the traditional approach makes it difficult to accurately detect defocused regions in the image. In this paper, the Convolutional Attention based Vision Transformer based Iterative Multi-Scale Fusion Network (CAViT-IMSFN) model is proposed for MFIF. The collected input images are fed into preprocessing approaches like normalization, data imputation, and augmentation for improving the generalization ability of the model and also for reducing overfitting issues. The convolutional-based model named MobileNetV2 is used to extract local features and the AViT model is introduced to extract global features present in the preprocessed images. The spatial attention model is used in the integration stage to integrate both local and global features that preserve spatial consistency. For enhancing image quality and spatial consistency, the iterative refinement model is implemented according to the feedback mechanism that helps to update the fused output iteratively. The gradient boosting optimization algorithm is used for weight adjustment and the multi-scale fusion model is applied for the identification of focused and defocused portions in the images. This proposed model improves network adaptability by capturing multi-scale features and also has the ability to handle various levels of detail and complexity. The experimental evaluation is performed in terms of using diverse performance evaluation measures and quantitative analyses. The proposed model attained performances of 1.435 from Normalized Mutual Information (NMI) and 0.78 s from computational time. The result showed that the proposed model achieved superior outcomes rather than other MFIF-related existing approaches.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"181 ","pages":"Article 113507"},"PeriodicalIF":6.6000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimizing multi-focus image fusion through convolutional attention vision transformers and spatial consistency models\",\"authors\":\"Shengchuan Jiang , Shanchuan Yu\",\"doi\":\"10.1016/j.asoc.2025.113507\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multi-Focus Image Fusion (MFIF) aims to integrate all pixels in the images to avoid defocused pixels simultaneously. The removal of defocused pixels in the image is more challenging because the traditional approach makes it difficult to accurately detect defocused regions in the image. In this paper, the Convolutional Attention based Vision Transformer based Iterative Multi-Scale Fusion Network (CAViT-IMSFN) model is proposed for MFIF. The collected input images are fed into preprocessing approaches like normalization, data imputation, and augmentation for improving the generalization ability of the model and also for reducing overfitting issues. The convolutional-based model named MobileNetV2 is used to extract local features and the AViT model is introduced to extract global features present in the preprocessed images. The spatial attention model is used in the integration stage to integrate both local and global features that preserve spatial consistency. For enhancing image quality and spatial consistency, the iterative refinement model is implemented according to the feedback mechanism that helps to update the fused output iteratively. The gradient boosting optimization algorithm is used for weight adjustment and the multi-scale fusion model is applied for the identification of focused and defocused portions in the images. This proposed model improves network adaptability by capturing multi-scale features and also has the ability to handle various levels of detail and complexity. The experimental evaluation is performed in terms of using diverse performance evaluation measures and quantitative analyses. The proposed model attained performances of 1.435 from Normalized Mutual Information (NMI) and 0.78 s from computational time. The result showed that the proposed model achieved superior outcomes rather than other MFIF-related existing approaches.</div></div>\",\"PeriodicalId\":50737,\"journal\":{\"name\":\"Applied Soft Computing\",\"volume\":\"181 \",\"pages\":\"Article 113507\"},\"PeriodicalIF\":6.6000,\"publicationDate\":\"2025-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S156849462500818X\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S156849462500818X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Optimizing multi-focus image fusion through convolutional attention vision transformers and spatial consistency models
Multi-Focus Image Fusion (MFIF) aims to integrate all pixels in the images to avoid defocused pixels simultaneously. The removal of defocused pixels in the image is more challenging because the traditional approach makes it difficult to accurately detect defocused regions in the image. In this paper, the Convolutional Attention based Vision Transformer based Iterative Multi-Scale Fusion Network (CAViT-IMSFN) model is proposed for MFIF. The collected input images are fed into preprocessing approaches like normalization, data imputation, and augmentation for improving the generalization ability of the model and also for reducing overfitting issues. The convolutional-based model named MobileNetV2 is used to extract local features and the AViT model is introduced to extract global features present in the preprocessed images. The spatial attention model is used in the integration stage to integrate both local and global features that preserve spatial consistency. For enhancing image quality and spatial consistency, the iterative refinement model is implemented according to the feedback mechanism that helps to update the fused output iteratively. The gradient boosting optimization algorithm is used for weight adjustment and the multi-scale fusion model is applied for the identification of focused and defocused portions in the images. This proposed model improves network adaptability by capturing multi-scale features and also has the ability to handle various levels of detail and complexity. The experimental evaluation is performed in terms of using diverse performance evaluation measures and quantitative analyses. The proposed model attained performances of 1.435 from Normalized Mutual Information (NMI) and 0.78 s from computational time. The result showed that the proposed model achieved superior outcomes rather than other MFIF-related existing approaches.
期刊介绍:
Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities.
Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.