{"title":"Interpretable wavelet transformer-enhanced framework for unsupervised deformable image registration.","authors":"Xinhao Bai, Hongpeng Wang, Yanding Qin, Jianda Han, Ningbo Yu","doi":"10.1002/mp.70056","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Deformable image registration (DIR) underpins quantitative analysis in clinical image-based diagnosis and intervention. Nevertheless, prevailing techniques falter due to their inadequate capacity to encapsulate high-frequency multi-scale data. Additionally, they lack explicit constraints on the deformation learning process, leading to poor interpretability.</p><p><strong>Purpose: </strong>To address these challenges, we propose WaveMorph, a DIR framework enhanced by discrete wavelet Transformers.</p><p><strong>Methods: </strong>The WaveMorph framework is composed of wavelet-based modules, characterized by their explicitly interpretable mathematical formulations. Specifically, we designed the Discrete Wavelet Transformer (DWFormer) module for the encoder, which helps capture high-frequency multi-scale details and enables information-preserving feature encoding. We also devised the Inverse Wavelet Transform Up-sampling (IWTU) enhanced decoder, which accumulates high-frequency multi-scale information from the encoder for precise reconstruction of the displacement vector field using a coarse-to-fine approach.</p><p><strong>Results: </strong>Comparative and ablation experiments were conducted on publicly available datasets, including OASIS, IXI, LPBA40, and MMWHS. Compared to state-of-the-art (SOTA) methods such as TransMorph, TransMatch, and UTSRMorph, our proposed method demonstrated superior performance.</p><p><strong>Conclusions: </strong>The experimental results show that the wavelet transformer-based network is effective in deformable MRI registration due to its ability to capture multi-scale features and its strong interpretability.</p>","PeriodicalId":94136,"journal":{"name":"Medical physics","volume":"52 10","pages":"e70056"},"PeriodicalIF":3.2000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/mp.70056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Deformable image registration (DIR) underpins quantitative analysis in clinical image-based diagnosis and intervention. Nevertheless, prevailing techniques falter due to their inadequate capacity to encapsulate high-frequency multi-scale data. Additionally, they lack explicit constraints on the deformation learning process, leading to poor interpretability.
Purpose: To address these challenges, we propose WaveMorph, a DIR framework enhanced by discrete wavelet Transformers.
Methods: The WaveMorph framework is composed of wavelet-based modules, characterized by their explicitly interpretable mathematical formulations. Specifically, we designed the Discrete Wavelet Transformer (DWFormer) module for the encoder, which helps capture high-frequency multi-scale details and enables information-preserving feature encoding. We also devised the Inverse Wavelet Transform Up-sampling (IWTU) enhanced decoder, which accumulates high-frequency multi-scale information from the encoder for precise reconstruction of the displacement vector field using a coarse-to-fine approach.
Results: Comparative and ablation experiments were conducted on publicly available datasets, including OASIS, IXI, LPBA40, and MMWHS. Compared to state-of-the-art (SOTA) methods such as TransMorph, TransMatch, and UTSRMorph, our proposed method demonstrated superior performance.
Conclusions: The experimental results show that the wavelet transformer-based network is effective in deformable MRI registration due to its ability to capture multi-scale features and its strong interpretability.