Tariq Mahmood , Tanzila Saba , Amjad Rehman , Faten S. Alamri
{"title":"ConvTNet fusion: A robust transformer-CNN framework for multi-class classification, multimodal feature fusion, and tissue heterogeneity handling","authors":"Tariq Mahmood , Tanzila Saba , Amjad Rehman , Faten S. Alamri","doi":"10.1016/j.compmedimag.2025.102621","DOIUrl":"10.1016/j.compmedimag.2025.102621","url":null,"abstract":"<div><div>Medical imaging is crucial for clinical practice, providing insight into organ structure and function. Advancements in imaging technologies enable automated image segmentation, which is essential for accurate diagnosis and treatment planning. However, challenges like class imbalance, tissue boundary delineation, and tissue interaction complexity persist. The study introduces ConvTNet, a hybrid model that combines Transformer and CNN features to improve renal CT image segmentation. It uses attention mechanisms and feature fusion techniques to enhance precision. ConvTNet uses the KC module to focus on critical image regions, enabling precise tissue boundary delineation in noisy and ambiguous boundaries. The Mix-KFCA module enhances feature fusion by combining multi-scale features and distinguishing between healthy kidney tissue and surrounding structures. The study proposes innovative preprocessing strategies, including noise reduction, data augmentation, and image normalization, that significantly optimize image quality and ensure reliable inputs for accurate segmentation. ConvTNet employs transfer learning, fine-tuning five pre-trained models to bolster model performance further and leverage knowledge from a vast array of feature extraction techniques. Empirical evaluations demonstrate that ConvTNet performs exceptionally in multi-label classification and lesion segmentation, with an AUC of 0.9970, sensitivity of 0.9942, DSC of 0.9533, and accuracy of 0.9921, proving its efficacy for precise renal cancer diagnosis.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102621"},"PeriodicalIF":4.9,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144893304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinming Li , Jing Wang , Yang Lv , Puming Zhang , Jun Zhao
{"title":"FastDIP: An effective approach for accelerating unsupervised low-count PET image reconstruction","authors":"Jinming Li , Jing Wang , Yang Lv , Puming Zhang , Jun Zhao","doi":"10.1016/j.compmedimag.2025.102639","DOIUrl":"10.1016/j.compmedimag.2025.102639","url":null,"abstract":"<div><h3>Introduction:</h3><div>Unsupervised deep learning methods can improve the image quality of positron emission tomography (PET) images without the need for large-scale datasets. However, these approaches typically require training a distinct network for each patient, making the reconstruction process extremely time-consuming and limiting their clinical applicability. In this paper, our research objective is to develop an efficient unsupervised learning framework for unsupervised PET image reconstruction, in order to fulfill the clinical requirement for real-time imaging capabilities.</div></div><div><h3>Methods:</h3><div>In this study, we present FastDIP, an efficient learning method for unsupervised low-count PET image reconstruction. FastDIP employs a two-stage reconstruction process, beginning with a rapid coarse reconstruction followed by a detailed fine reconstruction. The pixel-shuffle downsampling method is utilized to compress PET images and facilitate quick coarse reconstruction. Additionally, a wavelet-denoised PET image serves as input, replacing the traditional anatomical images. We also incorporate pre-training techniques to accelerate network convergence.</div></div><div><h3>Results:</h3><div>The efficacy of FastDIP was evaluated on simulated <sup>18</sup>F-AV45 brain datasets, as well as clinical <sup>18</sup>F-FDG brain and clinical <sup>68</sup>Ga-PSMA body datasets. FastDIP was compared to Deep Image Prior (DIP), Conditional Deep Image Prior (CDIP), Guided Deep Image Prior (GDIP), Self-supervised Pre-training DIP (SPDIP), Population Pre-training DIP (PPDIP) and various ablation methods. For the <sup>18</sup>F-AV45 dataset, FastDIP achieved better image quality than DIP using only 11% training time of them across different count levels. In the <sup>18</sup>F-FDG dataset, it achieved the lowest normalized mean square error and the highest structural similarity in just 2.2 min, outperforming DIP (10.7 min), CDIP (7.5 min), GDIP (9.8 min), SPDIP (166.7 min) and PPDIP (166.7 min). For the <sup>68</sup>Ga-PSMA dataset, FastDIP achieved the highest contrast-to-noise ratio and SUV<sub>max</sub> in 2.4 min, surpassing DIP (10.7 min), CDIP (32.0 min) , GDIP (32.7 min), SPDIP (16.7 min) and PPDIP (37.5 min).</div></div><div><h3>Conclusion:</h3><div>FastDIP is an efficient approach for unsupervised low-count PET image reconstruction that significantly reduces the network training time and markedly enhances image restoration performance.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"124 ","pages":"Article 102639"},"PeriodicalIF":4.9,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144893615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sayyed Shahid Hussain , Xu Degang , Pir Masoom Shah , Hikmat Khan , Adnan Zeb
{"title":"AlzFormer: Multi-modal framework for Alzheimer’s classification using MRI and graph-embedded demographics guided by adaptive attention gating","authors":"Sayyed Shahid Hussain , Xu Degang , Pir Masoom Shah , Hikmat Khan , Adnan Zeb","doi":"10.1016/j.compmedimag.2025.102638","DOIUrl":"10.1016/j.compmedimag.2025.102638","url":null,"abstract":"<div><div>Alzheimer’s disease (AD) is the most common neurodegenerative progressive disorder and the fifth-leading cause of death in older people. The detection of AD is a very challenging task for clinicians and radiologists due to the complex nature of this disease, thus requiring automatic data-driven machine-learning models to enhance diagnostic accuracy and support expert decision-making. However, machine learning models are hindered by three key limitations, in AD classification:(i) diffuse and subtle structural changes in the brain that make it difficult to capture global pathology (ii) non-uniform alterations across MRI planes, which limit single-view learning and (iii) the lack of deep integration of demographic context, which is often ignored despite its clinical importance. To address these challenges in this paper, we propose a novel multi-modal deep learning framework, named AlzFormer, that dynamically integrates 3D MRI with demographic features represented as knowledge graph embeddings for AD classification. Specifically, (i) to capture global and volumetric features, a 3D CNN is employed; (ii) to model plane-specific information, three parallel 2D CNNs are used for tri-planar processing (axial, coronal, sagittal), combined with a Transformer encoder; and (iii) to incorporate demographic context, we integrate demographic features as knowledge graph embeddings through a novel Adaptive Attention Gating mechanism that balances contributions from both modalities (i.e., MRI and demographics). Comprehensive experiments on two real-world datasets, including generalization tests, ablation studies, and robustness evaluation under noisy conditions, demonstrate that the proposed model provides a robust and effective solution for AD diagnosis. These results suggest strong potential for integration into Clinical Decision Support Systems (CDSS), offering a more interpretable and personalized approach to early Alzheimer’s detection.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"124 ","pages":"Article 102638"},"PeriodicalIF":4.9,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144891830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guiping Qian , Huaqiong Wang , Shan Luo , Yiming Sun , Dingguo Yu , Xiaodiao Chen , Fan Zhang
{"title":"Encoder-shared visual state space network for anterior segment reconstruction","authors":"Guiping Qian , Huaqiong Wang , Shan Luo , Yiming Sun , Dingguo Yu , Xiaodiao Chen , Fan Zhang","doi":"10.1016/j.compmedimag.2025.102631","DOIUrl":"10.1016/j.compmedimag.2025.102631","url":null,"abstract":"<div><div>The 3D (three-dimensional) reconstruction of the anterior segment obtained from AS-OCT scanning devices is essential for diagnosing and monitoring cornea and iris, as well as for localizing and quantifying keratitis. However, this process faces two significant challenges: (1) The consecutive images acquired through rotational scanning are difficult to align and register; (2) The existing medical image segmentation technology cannot effectively segment the cornea, which are critical preprocessing steps for effective 3D visualization of the anterior segment. To tackle these dual challenges, an encoder-shared visual state space network for the 3D reconstruction of the anterior segment is proposed. This network integrates image alignment and segmentation into a unified framework. It employs the same encoder to handle spatial images for both alignment and segmentation tasks. A visual state space projection method is utilized to compute the homography matrix of adjacent images, thereby facilitating their alignment. Furthermore, we introduce a channel-wise visual state space fusion technique in conjunction with a decoder block that captures complex contextual interdependencies, enhances shape-preserving feature representation, and improves segmentation accuracy. Based on the resulting corneal segmentation outcomes, we accurately reconstruct 3D volume data from the aligned images. Experimental results on the AIDK-Align and CORNEA datasets demonstrate that our proposed method exhibits remarkable performance in terms of anterior segment alignment, corneal segmentation and 3D reconstruction. Furthermore, we compared encoder-shared visual state space network with state-of-the-art medical image segmentation methods and image alignment algorithms, highlighting its advantages in both alignment and segmentation precision. Our code will be made available at <span><span>https://github.com/qianguiping/Es-VSS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"124 ","pages":"Article 102631"},"PeriodicalIF":4.9,"publicationDate":"2025-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144878882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A multi-backbone cascade and morphology-aware segmentation network for complex morphological X-ray coronary artery images","authors":"Xiaodong Zhou , Huibin Wang , Lili Zhang","doi":"10.1016/j.compmedimag.2025.102629","DOIUrl":"10.1016/j.compmedimag.2025.102629","url":null,"abstract":"<div><div>X-ray coronary artery images are the ‘gold standard’ technology for diagnosing coronary artery disease, but due to the complex morphology of the coronary arteries, such as overlapping, winding and uneven contrast media filling, the existing segmentation methods often suffer from segmentation errors and vessel breakage. To this end, we proposed a multi-backbone cascade and morphology-aware segmentation network (MBCMA-Net), which improves the feature extraction ability of the network through multi-backbone encoders, and embeds a vascular morphology-aware module in the backbone network to enhance the capability of complex structure recognition, and finally introduces a centerline loss function to maintain the vascular connectivity. During the experiment, we selected 1942 clear angiograms from two public datasets (DCA1<span><span><sup>1</sup></span></span> and CADICA<span><span><sup>2</sup></span></span>) and annotated them, and also used the public ARCADE<span><span><sup>3</sup></span></span> dataset for testing. Experimental results show that MBCMA-Net reaches an IoU of 87.14%, a DSC score of 92.72%, and a vascular connectivity score of 89.05%, which is better than the mainstream segmentation algorithms and can be used as a benchmark model for coronary artery segmentation.</div><div>Code repository: <span><span>https://gitee.com/zaleman/mbcma-net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102629"},"PeriodicalIF":4.9,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144922787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joanna Kaleta , Paweł Skierś , Jan Dubiński , Przemysław Korzeniowski , Tomasz Trzciński , Jakub M. Tomczak , Kamil Deja
{"title":"JointDiffusion: Joint representation learning for generative, predictive, and self-explainable AI in healthcare","authors":"Joanna Kaleta , Paweł Skierś , Jan Dubiński , Przemysław Korzeniowski , Tomasz Trzciński , Jakub M. Tomczak , Kamil Deja","doi":"10.1016/j.compmedimag.2025.102619","DOIUrl":"10.1016/j.compmedimag.2025.102619","url":null,"abstract":"<div><div>Joint machine learning models that allow synthesizing and classifying data often offer uneven performance between those tasks or are unstable to train. In this work, we depart from a set of empirical observations that indicate the usefulness of internal representations built by contemporary deep diffusion-based generative models not only for generating but also predicting. We then propose to extend the vanilla diffusion model with a classifier that allows for stable joint end-to-end training with shared parameterization between those objectives. The resulting joint diffusion model outperforms recent state-of-the-art hybrid methods in terms of both classification and generation quality on all evaluated benchmarks. On top of our joint training approach, we present its application to the medical data domain, where we show how joint training can aid with the problems crucial in the medical data domain. We show that our Joint Diffusion achieves superior performance in semi-supervised setup, where human annotation is scarce, while at the same time providing decisions explanations through counterfactual examples generation.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"124 ","pages":"Article 102619"},"PeriodicalIF":4.9,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144889665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mingliang Yang , Jinhao Lyu , Jianxing Hu , Xiangbing Bian , Yue Zhang , Sulian Su , Xin Lou
{"title":"DB-SNet: A dual branch network for aortic component segmentation and lesion localization","authors":"Mingliang Yang , Jinhao Lyu , Jianxing Hu , Xiangbing Bian , Yue Zhang , Sulian Su , Xin Lou","doi":"10.1016/j.compmedimag.2025.102592","DOIUrl":"10.1016/j.compmedimag.2025.102592","url":null,"abstract":"<div><div>Accurate segmentation of aortic components, such as lumen, calcification, and false lumen, and associated lesions, including aneurysm, stenosis, and dissection in CT angiography (CTA) scans is crucial for cardiovascular diagnosis and treatment planning. However, most existing automated methods generate binary masks with limited clinical utility and rely on separate computational pipelines for anatomical and lesion segmentation, resulting in higher resource demands. To address these limitations, we propose DB-SNet, a dual-branch 3D segmentation network based on the MedNeXt architecture. The model incorporates a shared encoder and task-specific decoders, enhanced by a novel channel-space cross-fusion module that facilitates effective feature interaction between the two branches. A systematic ablation study was conducted to assess the impact of different backbone architectures, information interaction strategies, and loss weight configurations on dual-task performance. Evaluated on 435 multi-center CTA cases for training and 493 external cases for validation, DB-SNet outperformed 15 state-of-the-art models, achieving the highest average scores on the Dice Similarity Coefficient (DSC: 0.615) and Intersection over Union (IoU: 0.524) metrics. Compared to the current best-performing method (MedNeXt), DB-SNet reduced model parameters by 64.8 % and computational complexity by 36.4 %, while achieving a 30.801 × inference speedup (37.985 s vs. 1170 s for manual annotation). This work introduces a new paradigm for efficient and integrated aortic analysis. By balancing model efficiency and accuracy, DB-SNet offers a robust solution for real-time, resource-constrained clinical environments. Our dataset and code can be accessed at <span><span>https://github.com/yml-bit/DB-SNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"124 ","pages":"Article 102592"},"PeriodicalIF":4.9,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144866520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing cardiac function assessment: Developing and validating a domain adaptive framework for automating the segmentation of echocardiogram videos","authors":"Mojdeh Nazari , Hassan Emami , Reza Rabiei , Hamid Reza Rabiee , Arsalan Salari , Hossein Sadr","doi":"10.1016/j.compmedimag.2025.102627","DOIUrl":"10.1016/j.compmedimag.2025.102627","url":null,"abstract":"<div><h3>Background</h3><div>Accurate segmentation of echocardiographic images is essential for assessing cardiac function, particularly in calculating key metrics such as ejection fraction. However, challenges such as domain discrepancy, noisy data, anatomical variability, and complex imaging conditions often hinder the performance of deep learning models in this domain.</div></div><div><h3>Objective</h3><div>To propose and validate a domain adaptive segmentation framework for automating the segmentation of echocardiographic images across diverse imaging conditions and modalities.</div></div><div><h3>Method</h3><div>The framework integrates a Variational AutoEncoder (VAE) for structured latent representation, a Wasserstein GAN (WGAN)-based domain alignment module to reduce feature distribution gaps. These components were selected based on their complementary roles; while the VAE ensures stable reconstruction and domain-invariant encoding, the WGAN aligns source and target feature distributions. It also incorporates depthwise separable convolutions for computational efficiency and employs PixelShuffle layers in the decoder module for high-resolution reconstruction. Experiments were conducted on two publicly available echocardiographic datasets—CAMUS and EchoNet-Dynamic—as well as a newly collected local dataset from Heshmat Hospital, Guilan, Iran, for external evaluation of the model's performance under varying imaging conditions and scanner types. The framework was evaluated using metrics such as Dice scores, Jaccard indices, and Hausdorff distances. A qualitative assessment involving two board-certified cardiologists with extensive experience in echocardiographic interpretation was also conducted to evaluate the clinical relevance and anatomical plausibility of the proposed framework’s segmentation outputs.</div></div><div><h3>Results</h3><div>The proposed framework achieves Dice scores of 84.6 % (CAMUS → EchoNet-Dynamic) and 89.1 % (EchoNet-Dynamic → CAMUS), outperforming recent state-of-the-art UDA methods. When adapting the Heshmat dataset as the target domain, the model maintains strong performance, achieving 83.0 % (EchoNet-Dynamic → Heshmat) and 84.1 % (CAMUS → Heshmat) Dice scores. All results were statistically significant (p < 0.01) when compared to the top-performing baseline.</div></div><div><h3>Conclusion</h3><div>By addressing critical challenges in echocardiographic segmentation, the proposed UDA framework could offer a significant advancement in this field. Its ability to handle domain discrepancy, noisy data, and anatomical variability makes it a reliable tool for cardiac health assessment.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"124 ","pages":"Article 102627"},"PeriodicalIF":4.9,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144866519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joel Jeffrey , Ashwin RajKumar , Sudhanshu Pandey , Lokesh Bathala , Phaneendra K. Yalavarthy
{"title":"Inference time correction based on confidence and uncertainty for improved deep-learning model performance and explainability in medical image classification","authors":"Joel Jeffrey , Ashwin RajKumar , Sudhanshu Pandey , Lokesh Bathala , Phaneendra K. Yalavarthy","doi":"10.1016/j.compmedimag.2025.102630","DOIUrl":"10.1016/j.compmedimag.2025.102630","url":null,"abstract":"<div><div>The major challenge faced by artificial intelligence (AI) models for medical image analysis is the class imbalance of training data and limited explainability. This study introduces a Confidence and Entropy-based Uncertainty Thresholding Algorithm (CEbUTAl), which is a novel post-processing method, designed to enhance both model performance and explainability. CEbUTAl modifies model predictions during inference, based on uncertainty and confidence measures, to improve classification in scenarios with class imbalance. CEbUTAl’s inference-time correction addresses explainability, while simultaneously improving performance, contrary to the prevailing notion that explainability necessitates a compromise in performance. The algorithm was evaluated across five medical imaging tasks: intracranial hemorrhage detection, optical coherence tomography analysis, breast cancer detection, carpal tunnel syndrome detection, and multi-class skin lesion classification. Results demonstrate that CEbUTAl improves accuracy by approximately 5% and increases sensitivity across multiple deep learning architectures, loss functions, and tasks. Comparative studies indicate that CEbUTAl outperforms state-of-the-art methods in addressing class imbalance and quantifying uncertainty. The model-agnostic, task-agnostic and post-processing nature of CEbUTAl makes it appealing for enhancing both performance and trustworthiness in medical image analysis. This study provides a generalizable approach to mitigate biases arising from class imbalance, while improving the explainability of AI models, thus increasing their utility in clinical practice.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"125 ","pages":"Article 102630"},"PeriodicalIF":4.9,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144893305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meng Wang , Haobo Chen , Lijuan Mao , Weiwei Jiao , Hong Han , Qi Zhang
{"title":"C5-net: Cross-organ cross-modality cswin-transformer coupled convolutional network for dual task transfer learning in lymph node segmentation and classification","authors":"Meng Wang , Haobo Chen , Lijuan Mao , Weiwei Jiao , Hong Han , Qi Zhang","doi":"10.1016/j.compmedimag.2025.102633","DOIUrl":"10.1016/j.compmedimag.2025.102633","url":null,"abstract":"<div><div>Deep learning has made notable strides in the ultrasonic diagnosis of lymph nodes, yet it faces three primary challenges: a limited number of lymph node images and a scarcity of annotated data; difficulty in comprehensively learning both local and global semantic information; and obstacles in collaborative learning for both image segmentation and classification to achieve accurate diagnosis. To address these issues, we propose the Cross-organ Cross-modality Cswin-transformer Coupled Convolutional Network (C<sup>5</sup>-Net). First, we design a cross-organ and cross-modality transfer learning strategy to leverage skin lesion dermoscopic images, which have abundant annotations and share similarities in fields of view and morphology with the lymph node ultrasound images. Second, we couple Transformer and convolutional network to comprehensively learn both local details and global information. Third, the encoder weights in the C<sup>5</sup>-Net are shared between segmentation and classification tasks to exploit the synergistic knowledge, enhancing overall performance in ultrasound lymph node diagnosis. Our study leverages 690 lymph node ultrasound images and 1000 skin lesion dermoscopic images. Experimental results show that our C<sup>5</sup>-Net achieves the best segmentation and classification performance for lymph nodes among advanced methods, with the Dice coefficient of segmentation equaling 0.854, and the accuracy of classification equaling 0.874. Our method has consistently shown accuracy and robustness in the segmentation and classification of lymph nodes, contributing to the early and accurate detection of lymph nodal malignancy, which is potentially essential for effective treatment planning in clinical oncology.</div></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"124 ","pages":"Article 102633"},"PeriodicalIF":4.9,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144852133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}