Preoperative evaluation frequently relies on multi-modal medical imaging to provide comprehensive anatomical and functional insights. However, acquiring such multi-modal data often involves high scanning costs and logistical challenges. Additionally, in practical applications, it is inconvenient to collect sufficient matched multi-modal data to train different models for different cross-modality synthesis tasks.
To address these issues, we propose a novel dual-branch architecture, named Mamba-Convolutional UNet, for multi-modal medical image synthesis. Furthermore, to enable cross-modal synthesis capabilities even under data scarcity, we address the practical challenge of limited paired multi-modal training data by introducing a simple reprogramming layer.
The proposed Mamba-Convolutional UNet adopts a U-shaped architecture featuring parallel SSM and convolutional branches. The SSM branch leverages Mamba to capture long-range dependencies and global context, while the convolutional branch extracts fine-grained local features through spatial operations. Then, an attention mechanism is utilized to integrate global and local features. To enhance adaptability across modalities with limited data, a lightweight reprogramming layer is incorporated into the Mamba module, allowing knowledge transfer from one cross-modal synthesis task to another without requiring extensive retraining.
We conducted five multi-modal medical image synthesis tasks on three datasets to validate the performance of our model. The results demonstrate that the performance of Mamba-Convolutional UNet significantly outperforms that of six baseline models. Moreover, Mamba-Convolutional UNet can attain comparable performance to the current state-of-the-art methods by fine-tuning the model for other synthesis tasks with only 25% of the data.
The proposed Mamba-Convolutional UNet features a dual-branch structure that effectively combines global and local features for enhanced medical image understanding. And the Mamba block's reprogramming layer addresses challenges in target modality transformation during insufficient training.