InceptionWTMNet: A hybrid network for Alzheimer’s Disease detection using wavelet transform convolution and Mixed Local Channel Attention on finely fused multimodal images

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-09-18 DOI:10.1016/j.imavis.2025.105693

Zenan Xu, Zhengyao Bai, Han Ma, Mingqiang Xu, Qiqin Huang, Tao Lin

{"title":"InceptionWTMNet: A hybrid network for Alzheimer’s Disease detection using wavelet transform convolution and Mixed Local Channel Attention on finely fused multimodal images","authors":"Zenan Xu, Zhengyao Bai, Han Ma, Mingqiang Xu, Qiqin Huang, Tao Lin","doi":"10.1016/j.imavis.2025.105693","DOIUrl":null,"url":null,"abstract":"<div><div>Multimodal fusion has emerged as a critical technique for the diagnosis of Alzheimer’s Disease (AD), with the aim of effectively extracting and utilising complementary information from diverse modalities. Current fusion methods frequently cause the precise alignment of source images and do not adequately address parallax issues. This oversight can result in artifacts during the fusion process when images are misaligned. In response to this challenge, we propose a refined registration fusion technique, termed MURF, which integrates multimodal image registration and fusion within a cohesive framework. The Vision Transformer (ViT) has inspired the application of large-kernel convolutions in the diagnosis of Alzheimer’s disease (AD) because of its ability to model long-range dependencies. This approach aims to expand the receptive field and enhance the performance of diagnostic models. Despite requiring a minimal number of floating-point operations (FLOPs), these deep operators encounter challenges associated with over-parameterisation because of high memory access costs, which ultimately compromises computational efficiency. By utilising wavelet transform convolutions (WTConv), we decompose large-kernel depth-wise convolutions into four parallel branches. One branch employs a wavelet-transform convolution with square kernels, while the other two branches incorporate orthogonal wavelet-transform kernels with an identity mapping. This innovative method, with a Mixed Local Channel Attention mechanism, has facilitated the development of the InceptionWTConvolutions network. This network maintains a receptive field comparable to that of large-kernel convolutions, while concurrently minimising over-parameterisation and enhancing computational efficiency. InceptionWTMNet classified AD, MCI, and NC using MRI and PET data from ADNI dataset with 98.69% accuracy, 98.65% recall, 98.70% F1-score, and 98.98% AUC. and provide Graphical abstract in correct format.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"163 ","pages":"Article 105693"},"PeriodicalIF":4.2000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625002811","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Multimodal fusion has emerged as a critical technique for the diagnosis of Alzheimer’s Disease (AD), with the aim of effectively extracting and utilising complementary information from diverse modalities. Current fusion methods frequently cause the precise alignment of source images and do not adequately address parallax issues. This oversight can result in artifacts during the fusion process when images are misaligned. In response to this challenge, we propose a refined registration fusion technique, termed MURF, which integrates multimodal image registration and fusion within a cohesive framework. The Vision Transformer (ViT) has inspired the application of large-kernel convolutions in the diagnosis of Alzheimer’s disease (AD) because of its ability to model long-range dependencies. This approach aims to expand the receptive field and enhance the performance of diagnostic models. Despite requiring a minimal number of floating-point operations (FLOPs), these deep operators encounter challenges associated with over-parameterisation because of high memory access costs, which ultimately compromises computational efficiency. By utilising wavelet transform convolutions (WTConv), we decompose large-kernel depth-wise convolutions into four parallel branches. One branch employs a wavelet-transform convolution with square kernels, while the other two branches incorporate orthogonal wavelet-transform kernels with an identity mapping. This innovative method, with a Mixed Local Channel Attention mechanism, has facilitated the development of the InceptionWTConvolutions network. This network maintains a receptive field comparable to that of large-kernel convolutions, while concurrently minimising over-parameterisation and enhancing computational efficiency. InceptionWTMNet classified AD, MCI, and NC using MRI and PET data from ADNI dataset with 98.69% accuracy, 98.65% recall, 98.70% F1-score, and 98.98% AUC. and provide Graphical abstract in correct format.

查看原文本刊更多论文

基于小波变换卷积和混合局部通道关注的混合阿尔茨海默病检测网络

多模态融合已成为阿尔茨海默病（AD）诊断的关键技术，其目的是有效地提取和利用来自不同模态的互补信息。当前的融合方法经常导致源图像的精确对齐，而不能充分解决视差问题。当图像不对齐时，这种疏忽会导致融合过程中的伪影。为了应对这一挑战，我们提出了一种改进的配准融合技术，称为MURF，它将多模态图像配准和融合集成在一个内聚框架内。视觉转换器（Vision Transformer, ViT）激发了大核卷积在阿尔茨海默病（Alzheimer 's disease， AD）诊断中的应用，因为它能够模拟长期依赖关系。该方法旨在扩大接受野，提高诊断模型的性能。尽管需要最少数量的浮点操作（flop），但由于内存访问成本高，这些深度运算符遇到了与过度参数化相关的挑战，这最终会影响计算效率。通过利用小波变换卷积（WTConv），我们将大核深度卷积分解为四个并行分支。其中一个分支使用方形核的小波变换卷积，而另外两个分支使用单位映射的正交小波变换核。这种具有混合本地信道注意机制的创新方法促进了InceptionWTConvolutions网络的发展。该网络保持了与大核卷积相当的接受场，同时最小化了过度参数化并提高了计算效率。InceptionWTMNet使用来自ADNI数据集的MRI和PET数据对AD、MCI和NC进行分类，准确率为98.69%，召回率为98.65%，f1评分为98.70%，AUC为98.98%。并提供正确格式的图形摘要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.