{"title":"CS U-NET: A Medical Image Segmentation Method Integrating Spatial and Contextual Attention Mechanisms Based on U-NET","authors":"Zhang Fanyang, Zhang Fan","doi":"10.1002/ima.70072","DOIUrl":"https://doi.org/10.1002/ima.70072","url":null,"abstract":"<div>\u0000 \u0000 <p>Medical image segmentation is a crucial process in medical image analysis, with convolutional neural network (CNN)-based methods achieving notable success in recent years. Among these, U-Net has gained widespread use due to its simple yet effective architecture. However, CNNs still struggle to capture global, long-range semantic information. To address this limitation, we present CS U-NET, a novel method built upon Swin-U-Net, which integrates spatial and contextual attention mechanisms. This hybrid approach combines the strengths of both transformers and U-Net architectures to enhance segmentation performance. In this framework, tokenized image patches are processed through a transformer-based U-shaped encoder-decoder, enabling the learning of both local and global semantic features via skip connections. Our method achieves a Dice Similarity Coefficient of 78.64% and a 95% Hausdorff distance of 21.25 on the Synapse multiorgan segmentation dataset, outperforming Trans-U-Net and other state-of-the-art U-Net variants by 4% and 6%, respectively. The experimental results highlight the significant improvements in prediction accuracy and edge detail preservation provided by our approach.</p>\u0000 </div>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"35 2","pages":""},"PeriodicalIF":3.0,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143707268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Foreground Background Difference Knowledge-Based Small Sample Target Segmentation for Image-Guided Radiation Therapy","authors":"Yuanzhi Cheng, Pengfei Zhang, Chang Liu, Changyong Guo, Shinichi Tamura","doi":"10.1002/ima.70075","DOIUrl":"https://doi.org/10.1002/ima.70075","url":null,"abstract":"<p>The aim of this paper is to exploit a small sample (data scarcity) target segmentation technique for image-guided radiation therapy. The technique is grounded on a prototype-based approach—widely used small sample segmentation method. In this paper, we propose a foreground–background difference knowledge learning framework to perform the small sample target segmentation task. Its main differences from the traditional prototype-based approaches and novel contributions may be enumerated in two aspects: (1) A subdivision strategy to generate multiple foreground–background prototypes for each class in the support images, and the generated prototype is used to build a collection of query foreground and background prototypes. (2) A cross-prototype attention module to learn the correlation and difference knowledge of inter-class prototypes and transfer the knowledge to the query prototype for iterative updates. The main advantage of our framework is that: (1) the intra-class prototype set can comprehensively reflect the class features, avoiding the high computational complexity caused by dense matching; and (2) knowledge of inter-class differences provides comprehensive foreground–background segmentation information, greatly supporting accurate segmentation of the query set. In the 5-shot SegRap dataset experiment, the proposed model achieved Dice coefficients of 82.23% in the same-domain setting and 81.01% in the cross-domain setting. Similarly, in the 5-shot HECKTOR2022 dataset experiment, it achieved 83.59% in the same-domain setting and 81.48% in the cross-domain setting. For the 5-shot BTCV and CHAOS datasets, the model attained Dice coefficients of 79.00% and 79.70%, respectively. These results demonstrate the model's accuracy, efficiency, and generalization. This study presents a significant advancement in medical image segmentation by introducing a prototype-based model that effectively addresses data scarcity. By leveraging intra- and inter-class attention mechanisms, the model ensures robust generalization and reliable performance across datasets, paving the way for efficient and precise clinical applications with minimal reliance on large annotated datasets.</p>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"35 2","pages":""},"PeriodicalIF":3.0,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143698779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bottom Double Branch Path Networks With Confidence Calibration for Intracranial Aneurysms Detection in 3D MRA","authors":"Shuhuinan Zheng, Qichang Fu, Wei Jin, Xiaomei Xu, Jianqing Wang, Xiaobo Lai, Lilin Guo","doi":"10.1002/ima.70071","DOIUrl":"https://doi.org/10.1002/ima.70071","url":null,"abstract":"<div>\u0000 \u0000 <p>Intracranial aneurysms (IAs) are characterized by abnormal dilation of the brain blood vessel wall, the rupture of which often leads to subarachnoid hemorrhage with a high mortality rate. Current detections rely heavily on radiologists' interpretation of magnetic resonance angiography (MRA) images, but manual identification is time-consuming and laborious. Therefore, it is urgent to carry out automatic detection tools for IAs, and various intelligent models have been developed in recent years. However, the size of IAs is relatively small compared with the high voxel resolution MRA images, and thus the data imbalance leads to a high false positive (FP) rate. To address these challenges, we have proposed an innovative 3D voxel detection framework based on Feature Pyramid Network (FPN) architecture, which is called bottom double branch path network with confidence calibration (BCOC for short). BCOC shows better effects on small objects for preserving diversities of feature maps and also creates efficient feature extractors by reducing the number of channels per layer, making it particularly advantageous for handling large three-dimensional resolutions. Additionally, optimal transport (OT) has been applied for matching the detection and ground truth bounding boxes during the post-process phase to refine bounding box positions, thereby further improving the detection performance. Moreover, the confidence score of model output is calibrated via calibration loss during training to make correct detections with higher confidence and wrong detections with lower confidence, which can reduce the FP rate. Our proposed model achieves mean average precision (AP) of 0.8186 and 0.8533, sensitivity of 93.91% and 98.43%, FPs/case of 0.1332 and 0.0541 on two public MRA datasets including cases with IAs collected from different hospitals, respectively, outperforming other state-of-the-art methods. The results show that BCOC is a promising detection method for IAs automatic recognition.</p>\u0000 </div>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"35 2","pages":""},"PeriodicalIF":3.0,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143689516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IMDF-Net: Iterative U-Net With Multi-Kernel Dilated Convolution and Fusion Modules for Enhanced Retinal Vessel Segmentation","authors":"Jiale Deng, Lina Yang, Yuwen Lin","doi":"10.1002/ima.70073","DOIUrl":"https://doi.org/10.1002/ima.70073","url":null,"abstract":"<div>\u0000 \u0000 <p>In the early diagnosis of diabetic retinopathy, the morphological properties of blood vessels serve as an important reference for doctors to assess a patient's condition, facilitating scientific diagnostic and therapeutic interventions. However, vascular deformations, proliferation, and rupture caused by retinal diseases are often difficult to detect in the early stages. The assessment of retinal vessel morphology is subjective, time-consuming, and heavily dependent on the professional experience of the physician. Therefore, computer-aided diagnostic systems have gradually played a significant role in this field. Existing neural networks, particularly U-Net and its variants, have shown promising results in retinal vessel segmentation. However, due to the information loss caused by multiple pooling operations and the insufficient handling of local contextual features in skip connections, most segmentation methods still face challenges in accurately detecting microvessels. To address these limitations and assist medical staff in the early diagnosis of retinal diseases, we propose an iterative retinal vessel segmentation network with multi-dimensional attention and multi-scale feature fusion, named IMDF-Net. The network consists of a backbone network and an iterative refinement network. In the backbone network, we have designed a cascaded multi-kernel dilated convolution module and a multi-scale feature fusion module during the upsampling phase. These components expand the receptive field, effectively combine global information and local features, and propagate deep features to the shallow layers. Additionally, we have designed an iterative network to further capture missing information and correct erroneous segmentation results. Experimental results demonstrate that IMDF-Net outperforms several state-of-the-art methods on the DRIVE dataset, achieving the best performance across all evaluation metrics. On the CHASE_DB1 dataset, it achieves optimal performance in four metrics. It demonstrates its superiority in both overall performance and visual results, with a significant improvement in the segmentation of microvessels.</p>\u0000 </div>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"35 2","pages":""},"PeriodicalIF":3.0,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143689515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Malik Abdul Manan, Jinchao Feng, Shahzad Ahmed, Abdul Raheem
{"title":"Multiscale Feature Fusion Booster Network for Segmentation of Colorectal Polyp","authors":"Malik Abdul Manan, Jinchao Feng, Shahzad Ahmed, Abdul Raheem","doi":"10.1002/ima.70068","DOIUrl":"https://doi.org/10.1002/ima.70068","url":null,"abstract":"<div>\u0000 \u0000 <p>Addressing the challenges posed by colorectal polyp variability and imaging inconsistencies in endoscopic images, we propose the multiscale feature fusion booster network (MFFB-Net), a novel deep learning (DL) framework for the semantic segmentation of colorectal polyps to aid in early colorectal cancer detection. Unlike prior models, such as the pyramid vision transformer-based cascaded attention decoder (PVT-CASCADE) and the parallel reverse attention network (PraNet), MFFB-Net enhances segmentation accuracy and efficiency through a unique fusion of multiscale feature extraction in both the encoder and decoder stages, coupled with a booster module for refining fine-grained details and a bottleneck module for efficient feature compression. The network leverages multipath feature extraction with skip connections, capturing both local and global contextual information, and is rigorously evaluated on seven benchmark datasets, including Kvasir, CVC-ClinicDB, CVC-ColonDB, ETIS, CVC-300, BKAI-IGH, and EndoCV2020. MFFB-Net achieves state-of-the-art (SOTA) performance, with Dice scores of 94.38%, 91.92%, 91.21%, 80.34%, 82.67%, 76.92%, and 74.29% on CVC-ClinicDB, Kvasir, CVC-300, ETIS, CVC-ColonDB, EndoCV2020, and BKAI-IGH, respectively, outperforming existing models in segmentation accuracy and computational efficiency. MFFB-Net achieves real-time processing speeds of 26 FPS with only 1.41 million parameters, making it well suited for real-world clinical applications. The results underscore the robustness of MFFB-Net, demonstrating its potential for real-time deployment in computer-aided diagnosis systems and setting a new benchmark for automated polyp segmentation.</p>\u0000 </div>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"35 2","pages":""},"PeriodicalIF":3.0,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143688927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohsin Butt, D. N. F. NurFatimah, Majid Ali Khan, Ghazanfar Latif, Abul Bashar
{"title":"MEDCnet: A Memory Efficient Approach for Processing High-Resolution Fundus Images for Diabetic Retinopathy Classification Using CNN","authors":"Mohsin Butt, D. N. F. NurFatimah, Majid Ali Khan, Ghazanfar Latif, Abul Bashar","doi":"10.1002/ima.70063","DOIUrl":"https://doi.org/10.1002/ima.70063","url":null,"abstract":"<p>Modern medical imaging equipment can capture very high-resolution images with detailed features. These high-resolution images have been used in several domains. Diabetic retinopathy (DR) is a medical condition where increased blood sugar levels of diabetic patients affect the retinal vessels of the eye. The usage of high-resolution fundus images in DR classification is quite limited due to Graphics processing unit (GPU) memory constraints. The GPU memory problem becomes even worse with the increased complexity of the current state-of-the-art deep learning models. In this paper, we propose a memory-efficient divide-and-conquer-based approach for training deep learning models that can identify both high-level and detailed low-level features from high-resolution images within given GPU memory constraints. The proposed approach initially uses the traditional transfer learning technique to train the deep learning model with reduced-sized images. This trained model is used to extract detailed low-level features from fixed-size patches of higher-resolution fundus images. These detailed features are then utilized for classification based on standard machine learning algorithms. We have evaluated our proposed approach using the DDR and APTOS datasets. The results of our approach are compared with different approaches, and our model achieves a maximum classification accuracy of 95.92% and 97.39% on the DDR and APTOS datasets, respectively. In general, the proposed approach can be used to get better accuracy by using detailed features from high-resolution images within GPU memory constraints.</p>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"35 2","pages":""},"PeriodicalIF":3.0,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ima.70063","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143689191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dual-Path Multi-Scale CNN for Precise Classification of Non-Small Cell Lung Cancer","authors":"Vidhi Bishnoi, Lavanya, Palak Handa, Nidhi Goel","doi":"10.1002/ima.70066","DOIUrl":"https://doi.org/10.1002/ima.70066","url":null,"abstract":"<div>\u0000 \u0000 <p>Non-Small Cell Lung Cancer (NSCLC) has the highest cancer-related mortality rate worldwide. While biopsy-based diagnosis is critical for prognosis and treatment, the intricate anatomical features in Whole Slide Images (WSIs) make manual classification challenging for pathologists. Current deep learning models have been developed to aid in the automatic classification of NSCLC, but many rely on extensive manual annotations and lack efficient multi-scale feature extraction, limiting their ability to capture diverse patterns in WSIs. There is a need to explore multipath, multi-scale Convolutional Neural Networks (CNN) that can effectively capture these diverse patterns in WSIs. This study proposes a novel deep learning model, a Multi-scale, Dual-Path CNN (MDP-CNN), designed to automatically classify NSCLC subtypes by capturing heterogeneous patterns and features in WSIs. The model was trained on two independent datasets, LC25000 and The Cancer Genome Atlas (TCGA), demonstrating notable improvements in performance metrics, achieving accuracy scores of 0.981 and 0.958, Area Under Curve (AUC) scores of 0.978 and 0.995, and kappa scores of 0.957 and 0.903 for the LC25000 and TCGA datasets, respectively. Extensive analyses, including ablation studies, interpretation plots, and cross-dataset analysis, were conducted to demonstrate the efficacy of the proposed model. Multi-scale processing improved the model's precision in classifying lung cancer subtypes by capturing variations in histopathological features across different resolutions. The proposed model outperformed state-of-the-art models by approximately 8% in accuracy and 3% in AUC, demonstrating the effectiveness of MDP CNNs in improving WSI-based diagnostics and supporting automated NSCLC classification and clinical decisions.</p>\u0000 </div>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"35 2","pages":""},"PeriodicalIF":3.0,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143689192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huiyu Huang, Shreyas Balaji, Bulent Aslan, Yan Wen, Magdy Selim, Ajith J. Thomas, Aristotelis Filippidis, Pascal Spincemaille, Yi Wang, Salil Soman
{"title":"Quantitative Susceptibility Mapping MRI With Computer Vision Metrics to Reduce Scan Time for Brain Hemorrhage Assessment","authors":"Huiyu Huang, Shreyas Balaji, Bulent Aslan, Yan Wen, Magdy Selim, Ajith J. Thomas, Aristotelis Filippidis, Pascal Spincemaille, Yi Wang, Salil Soman","doi":"10.1002/ima.70070","DOIUrl":"https://doi.org/10.1002/ima.70070","url":null,"abstract":"<div>\u0000 \u0000 <p>Optimizing clinical imaging parameters balances scan time and image quality. quantitative susceptibility mapping (QSM) MRI, particularly, for detecting intracranial hemorrhage (ICH), involves multiple echo times (TEs), leading to longer scan durations that can impact patient comfort and imaging efficiency. This study evaluates the necessity of specific TEs for QSM MRI in ICH patients and identifies shorter scan protocols using computer vision metrics (CVMs) to maintain diagnostic accuracy. Fifty-four patients with suspected ICH were retrospectively recruited. multiecho gradient recalled echo (mGRE) sequences with 11 TEs were used for QSM MRI (reference). Subsets of TEs compatible with producing QSM MRI images were generated, producing 71 subgroups per patient. QSM images from each subgroup were compared to reference images using 14 CVMs. Linear regression and Wilcoxon signed-rank tests identified optimal subgroups minimizing scan time while preserving image quality as part of the computer vision optimized rapid imaging (CORI) method described. CVM-based analysis demonstrated Subgroup 1 (TE1-3) to be optimal using several CVMs, supporting a reduction in scan time from 4.5 to 1.23 min (73% reduction). Other CVMs suggested longer maximum TE subgroups as optimal, achieving scan time reductions of 9%–37%. Visual assessments by a neuroradiologist and trained research assistant confirmed no significant difference in ICH area measurements between reference and CORI-identified optimal subgroup-derived QSM, while CORI-identified worst subgroups derived QSM differed significantly (<i>p</i> < 0.05). The findings support using shorter QSM MRI protocols for ICH evaluation and suggest CVMs may aid optimization efforts for other clinical imaging protocols.</p>\u0000 </div>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"35 2","pages":""},"PeriodicalIF":3.0,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143688926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Javed Rashid, Salah Mahmoud Boulaaras, Muhammad Shoaib Saleem, Muhammad Faheem, Muhammad Umair Shahzad
{"title":"A Novel Transfer Learning Approach for Skin Cancer Classification on ISIC 2024 3D Total Body Photographs","authors":"Javed Rashid, Salah Mahmoud Boulaaras, Muhammad Shoaib Saleem, Muhammad Faheem, Muhammad Umair Shahzad","doi":"10.1002/ima.70065","DOIUrl":"https://doi.org/10.1002/ima.70065","url":null,"abstract":"<div>\u0000 \u0000 <p>Skin cancer, and melanoma in particular, is a significant public health issue in the modern era because of the exponential death rate. Previous research has used 2D data to detect skin cancer, and the present methods, such as biopsies, are arduous. Therefore, we need new, more effective models and tools to tackle current problems quickly. The main objective of the work is to improve the 3D ResNet50 model for skin cancer classification by transfer learning. Trained on the ISIC 2024 3D Total Body Photographs (3D-TBP), a Kaggle competition dataset, the model aims to detect five significant types of skin cancer: Melanoma (Mel), Melanocytic nevus (Nev), Basal cell carcinoma (BCC), Actinic keratosis (AK), and Benign keratosis (BK). While fine-tuning achieves peak performance, data augmentation addresses the issue of overfitting. The proposed model outperforms state-of-the-art methods with an overall accuracy of 93.88%. Since the accuracy drops to 85.67% while utilizing 2D data, the substantial contribution becomes apparent when working with 3D data. The model articulates excellent memory and precision with remarkable accuracy. According to the findings, the 3D ResNet50 model improves the diagnostic process and may be rated better than conventional approaches as a noninvasive, accurate, and efficient substitute. The current model is valuable because it can help with a significant clinical application: the early diagnosis of melanoma.</p>\u0000 </div>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"35 2","pages":""},"PeriodicalIF":3.0,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143689197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
HuiFang Wang, YaTong Liu, Jiongyao Ye, Dawei Yang, Yu Zhu
{"title":"TS-Net: Trans-Scale Network for Medical Image Segmentation","authors":"HuiFang Wang, YaTong Liu, Jiongyao Ye, Dawei Yang, Yu Zhu","doi":"10.1002/ima.70064","DOIUrl":"https://doi.org/10.1002/ima.70064","url":null,"abstract":"<div>\u0000 \u0000 <p>Accurate medical image segmentation is crucial for clinical diagnosis and disease treatment. However, there are still great challenges for most existing methods to extract accurate features from medical images because of blurred boundaries and various appearances. To overcome the above limitations, we propose a novel medical image segmentation network named TS-Net that effectively combines the advantages of CNN and Transformer to enhance the feature extraction ability. Specifically, we design a Multi-scale Convolution Modulation (MCM) module to simplify the self-attention mechanism through a convolution modulation strategy that incorporates multi-scale large-kernel convolution into depth-separable convolution, effectively extracting the multi-scale global features and local features. Besides, we adopt the concept of feature complementarity to facilitate the interaction between high-level semantic features and low-level spatial features through the designed Scale Inter-active Attention (SIA) module. The proposed method is evaluated on four different types of medical image segmentation datasets, and the experimental results show its competence with other state-of-the-art methods. The method achieves an average Dice Similarity Coefficient (DSC) of 90.79% ± 1.01% on the public NIH dataset for pancreas segmentation, 76.62% ± 4.34% on the public MSD dataset for pancreatic cancer segmentation, 80.70% ± 6.40% on the private PROMM (Prostate Multi-parametric MRI) dataset for prostate cancer segmentation, and 91.42% ± 0.55% on the public Kvasir-SEG dataset for polyp segmentation. The experimental results across the four different segmentation tasks for medical images demonstrate the effectiveness of the Trans-Scale network.</p>\u0000 </div>","PeriodicalId":14027,"journal":{"name":"International Journal of Imaging Systems and Technology","volume":"35 2","pages":""},"PeriodicalIF":3.0,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143639173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}