Mena Shenouda, Eyjólfur Gudmundsson, Feng Li, Christopher M. Straus, Hedy L. Kindler, Arkadiusz Z. Dudek, Thomas Stinchcombe, Xiaofei Wang, Adam Starkey, Samuel G. Armato III
{"title":"Convolutional Neural Networks for Segmentation of Pleural Mesothelioma: Analysis of Probability Map Thresholds (CALGB 30901, Alliance)","authors":"Mena Shenouda, Eyjólfur Gudmundsson, Feng Li, Christopher M. Straus, Hedy L. Kindler, Arkadiusz Z. Dudek, Thomas Stinchcombe, Xiaofei Wang, Adam Starkey, Samuel G. Armato III","doi":"10.1007/s10278-024-01092-z","DOIUrl":"https://doi.org/10.1007/s10278-024-01092-z","url":null,"abstract":"<p>The purpose of this study was to evaluate the impact of probability map threshold on pleural mesothelioma (PM) tumor delineations generated using a convolutional neural network (CNN). One hundred eighty-six CT scans from 48 PM patients were segmented by a VGG16/U-Net CNN. A radiologist modified the contours generated at a 0.5 probability threshold. Percent difference of tumor volume and overlap using the Dice Similarity Coefficient (DSC) were compared between the reference standard provided by the radiologist and CNN outputs for thresholds ranging from 0.001 to 0.9. CNN-derived contours consistently yielded smaller tumor volumes than radiologist contours. Reducing the probability threshold from 0.5 to 0.01 decreased the absolute percent volume difference, on average, from 42.93% to 26.60%. Median and mean DSC ranged from 0.57 to 0.59, with a peak at a threshold of 0.2; no distinct threshold was found for percent volume difference. The CNN exhibited deficiencies with specific disease presentations, such as severe pleural effusion or disease in the pleural fissure. No single output threshold in the CNN probability maps was optimal for both tumor volume and DSC. This study emphasized the importance of considering both figures of merit when evaluating deep learning-based tumor segmentations across probability thresholds. This work underscores the need to simultaneously assess tumor volume and spatial overlap when evaluating CNN performance. While automated segmentations may yield comparable tumor volumes to that of the reference standard, the spatial region delineated by the CNN at a specific threshold is equally important.</p>","PeriodicalId":50214,"journal":{"name":"Journal of Digital Imaging","volume":"3 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142226943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Baur, Richard Bieck, Johann Berger, Patrick Schöfer, Tim Stelzner, Juliane Neumann, Thomas Neumuth, Christoph-E. Heyde, Anna Voelker
{"title":"Automated Three-Dimensional Imaging and Pfirrmann Classification of Intervertebral Disc Using a Graphical Neural Network in Sagittal Magnetic Resonance Imaging of the Lumbar Spine","authors":"David Baur, Richard Bieck, Johann Berger, Patrick Schöfer, Tim Stelzner, Juliane Neumann, Thomas Neumuth, Christoph-E. Heyde, Anna Voelker","doi":"10.1007/s10278-024-01251-2","DOIUrl":"https://doi.org/10.1007/s10278-024-01251-2","url":null,"abstract":"<p>This study aimed to develop a graph neural network (GNN) for automated three-dimensional (3D) magnetic resonance imaging (MRI) visualization and Pfirrmann grading of intervertebral discs (IVDs), and benchmark it against manual classifications. Lumbar IVD MRI data from 300 patients were retrospectively analyzed. Two clinicians assessed the manual segmentation and grading for inter-rater reliability using Cohen's kappa. The IVDs were then processed and classified using an automated convolutional neural network (CNN)–GNN pipeline, and their performance was evaluated using F1 scores. Manual Pfirrmann grading exhibited moderate agreement (κ = 0.455–0.565) among the clinicians, with higher exact match frequencies at lower lumbar levels. Single-grade discrepancies were prevalent except at L5/S1. Automated segmentation of IVDs using a pretrained U-Net model achieved an F1 score of 0.85, with a precision and recall of 0.83 and 0.88, respectively. Following 3D reconstruction of the automatically segmented IVD into a 3D point-cloud representation of the target intervertebral disc, the GNN model demonstrated moderate performance in Pfirrmann classification. The highest precision (0.81) and F1 score (0.71) were observed at L2/3, whereas the overall metrics indicated moderate performance (precision: 0.46, recall: 0.47, and F1 score: 0.46), with variability across spinal levels. The integration of CNN and GNN offers a new perspective for automating IVD analysis in MRI. Although the current performance highlights the need for further refinement, the moderate accuracy of the model, combined with its 3D visualization capabilities, establishes a promising foundation for more advanced grading systems.</p>","PeriodicalId":50214,"journal":{"name":"Journal of Digital Imaging","volume":"7 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kiduk Kim, Kyungjin Cho, Yujeong Eo, Jeeyoung Kim, Jihye Yun, Yura Ahn, Joon Beom Seo, Gil-Sun Hong, Namkug Kim
{"title":"Screening Patient Misidentification Errors Using a Deep Learning Model of Chest Radiography: A Seven Reader Study","authors":"Kiduk Kim, Kyungjin Cho, Yujeong Eo, Jeeyoung Kim, Jihye Yun, Yura Ahn, Joon Beom Seo, Gil-Sun Hong, Namkug Kim","doi":"10.1007/s10278-024-01245-0","DOIUrl":"https://doi.org/10.1007/s10278-024-01245-0","url":null,"abstract":"<p>We aimed to evaluate the ability of deep learning (DL) models to identify patients from a paired chest radiograph (CXR) and compare their performance with that of human experts. In this retrospective study, patient identification DL models were developed using 240,004 CXRs. The models were validated using multiple datasets, namely, internal validation, CheXpert, and Chest ImaGenome (CIG), which include different populations. Model performance was analyzed in terms of disease change status. The performance of the models to identify patients from paired CXRs was compared with three junior radiology residents (group I), two senior radiology residents (group II), and two board-certified expert radiologists (group III). For the reader study, 240 patients (age, 56.617 ± 13.690 years, 113 females, 160 same pairs) were evaluated. A one-sided non-inferiority test was performed with a one-sided margin of 0.05. SimChest, our similarity-based DL model, demonstrated the best patient identification performance across multiple datasets, regardless of disease change status (internal validation [area under the receiver operating characteristic curve range: 0.992–0.999], CheXpert [0.933–0.948], and CIG [0.949–0.951]). The radiologists identified patients from the paired CXRs with a mean accuracy of 0.900 (95% confidence interval: 0.852–0.948), with performance increasing with experience (mean accuracy:group I [0.874], group II [0.904], group III [0.935], and SimChest [0.904]). SimChest achieved non-inferior performance compared to the radiologists (<i>P</i> for non-inferiority: 0.015). The findings of this diagnostic study indicate that DL models can screen for patient misidentification using a pair of CXRs non-inferiorly to human experts.</p>","PeriodicalId":50214,"journal":{"name":"Journal of Digital Imaging","volume":"6 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ju Zhang, Lieli Ye, Weiwei Gong, Mingyang Chen, Guangyu Liu, Yun Cheng
{"title":"A Novel Network for Low-Dose CT Denoising Based on Dual-Branch Structure and Multi-Scale Residual Attention","authors":"Ju Zhang, Lieli Ye, Weiwei Gong, Mingyang Chen, Guangyu Liu, Yun Cheng","doi":"10.1007/s10278-024-01254-z","DOIUrl":"https://doi.org/10.1007/s10278-024-01254-z","url":null,"abstract":"<p>Deep learning-based denoising of low-dose medical CT images has received great attention both from academic researchers and physicians in recent years, and has shown important application value in clinical practice. In this work, a novel two-branch and multi-scale residual attention-based network for low-dose CT image denoising is proposed. It adopts a two-branch framework structure, to extract and fuse image features at shallow and deep levels respectively, to recover image texture and structure information as much as possible. We propose the adaptive dynamic convolution block (ADCB) in the local information extraction layer. It can effectively extract the detailed information of low-dose CT denoising and enables the network to better capture the local details and texture features of the image, thereby improving the denoising effect and image quality. Multi-scale edge enhancement attention block (MEAB) is proposed in the global information extraction layer, to perform feature fusion through dilated convolution and a multi-dimensional attention mechanism. A multi-scale residual convolution block (MRCB) is proposed to integrate feature information and improve the robustness and generalization of the network. To demonstrate the effectiveness of our method, extensive comparison experiments are conducted and the performances evaluated on two publicly available datasets. Our model achieves 29.3004 PSNR, 0.8659 SSIM, and 14.0284 RMSE on the AAPM-Mayo dataset. It is evaluated by adding four different noise levels σ = 15, 30, 45, and 60 on the Qin_LUNG_CT dataset and achieves the best results. Ablation studies show that the proposed ADCB, MEAB, and MRCB modules improve the denoising performances significantly. The source code is available at https://github.com/Ye111-cmd/LDMANet.</p>","PeriodicalId":50214,"journal":{"name":"Journal of Digital Imaging","volume":"2 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yifeng Yang, Liangyun Hu, Yang Chen, Weidong Gu, Yuanzhong Xie, Shengdong Nie
{"title":"Sex-Specific Imaging Biomarkers for Parkinson’s Disease Diagnosis: A Machine Learning Analysis","authors":"Yifeng Yang, Liangyun Hu, Yang Chen, Weidong Gu, Yuanzhong Xie, Shengdong Nie","doi":"10.1007/s10278-024-01235-2","DOIUrl":"https://doi.org/10.1007/s10278-024-01235-2","url":null,"abstract":"<p>This study aimed to identify sex-specific imaging biomarkers for Parkinson’s disease (PD) based on multiple MRI morphological features by using machine learning methods. Participants were categorized into female and male subgroups, and various structural morphological features were extracted. An ensemble Lasso (EnLasso) method was employed to identify a stable optimal feature subset for each sex-based subgroup. Eight typical classifiers were adopted to construct classification models for PD and HC, respectively, to validate whether models specific to sex subgroups could bolster the precision of PD identification. Finally, statistical analysis and correlation tests were carried out on significant brain region features to identify potential sex-specific imaging biomarkers. The best model (MLP) based on the female subgroup and male subgroup achieved average classification accuracy of 92.83% and 92.11%, respectively, which were better than that of the model based on the overall samples (86.88%) and the overall model incorporating gender factor (87.52%). In addition, the most discriminative feature of PD among males was the lh 6r (FD), but among females, it was the lh PreS (GI). The findings indicate that the sex-specific PD diagnosis model yields a significantly higher classification performance compared to previous models that included all participants. Additionally, the male subgroup exhibited a greater number of brain region changes than the female subgroup, suggesting sex-specific differences in PD risk markers. This study underscore the importance of stratifying data by sex and offer insights into sex-specific variations in PD phenotypes, which could aid in the development of precise and personalized diagnostic approaches in the early stages of the disease.</p>","PeriodicalId":50214,"journal":{"name":"Journal of Digital Imaging","volume":"283 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142216221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luong Huu Dang, Shih-Han Hung, Nhi Thao Ngoc Le, Wei-Kai Chuang, Jeng-You Wu, Ting-Chieh Huang, Nguyen Quoc Khanh Le
{"title":"Enhancing Nasopharyngeal Carcinoma Survival Prediction: Integrating Pre- and Post-Treatment MRI Radiomics with Clinical Data","authors":"Luong Huu Dang, Shih-Han Hung, Nhi Thao Ngoc Le, Wei-Kai Chuang, Jeng-You Wu, Ting-Chieh Huang, Nguyen Quoc Khanh Le","doi":"10.1007/s10278-024-01109-7","DOIUrl":"https://doi.org/10.1007/s10278-024-01109-7","url":null,"abstract":"<p>Recurrences are frequent in nasopharyngeal carcinoma (NPC) despite high remission rates with treatment, leading to considerable morbidity. This study aimed to develop a prediction model for NPC survival by harnessing both pre- and post-treatment magnetic resonance imaging (MRI) radiomics in conjunction with clinical data, focusing on 3-year progression-free survival (PFS) as the primary outcome. Our comprehensive approach involved retrospective clinical and MRI data collection of 276 eligible NPC patients from three independent hospitals (180 in the training cohort, 46 in the validation cohort, and 50 in the external cohort) who underwent MRI scans twice, once within 2 months prior to treatment and once within 10 months after treatment. From the contrast-enhanced T1-weighted images before and after treatment, 3404 radiomics features were extracted. These features were not only derived from the primary lesion but also from the adjacent lymph nodes surrounding the tumor. We conducted appropriate feature selection pipelines, followed by Cox proportional hazards models for survival analysis. Model evaluation was performed using receiver operating characteristic (ROC) analysis, the Kaplan–Meier method, and nomogram construction. Our study unveiled several crucial predictors of NPC survival, notably highlighting the synergistic combination of pre- and post-treatment data in both clinical and radiomics assessments. Our prediction model demonstrated robust performance, with an accuracy of AUCs of 0.66 (95% CI: 0.536–0.779) in the training cohort, 0.717 (95% CI: 0.536–0.883) in the testing cohort, and 0.827 (95% CI: 0.684–0.948) in validation cohort in prognosticating patient outcomes. Our study presented a novel and effective prediction model for NPC survival, leveraging both pre- and post-treatment clinical data in conjunction with MRI features. Its constructed nomogram provides potentially significant implications for NPC research, offering clinicians a valuable tool for individualized treatment planning and patient counseling.</p>","PeriodicalId":50214,"journal":{"name":"Journal of Digital Imaging","volume":"26 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140840340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tiphaine Diot-Dejonghe, Benjamin Leporq, Amine Bouhamama, Helene Ratiney, Frank Pilleul, Olivier Beuf, Frederic Cervenansky
{"title":"Development of a Secure Web-Based Medical Imaging Analysis Platform: The AWESOMME Project","authors":"Tiphaine Diot-Dejonghe, Benjamin Leporq, Amine Bouhamama, Helene Ratiney, Frank Pilleul, Olivier Beuf, Frederic Cervenansky","doi":"10.1007/s10278-024-01110-0","DOIUrl":"https://doi.org/10.1007/s10278-024-01110-0","url":null,"abstract":"<p>Precision medicine research benefits from machine learning in the creation of robust models adapted to the processing of patient data. This applies both to pathology identification in images, i.e., annotation or segmentation, and to computer-aided diagnostic for classification or prediction. It comes with the strong need to exploit and visualize large volumes of images and associated medical data. The work carried out in this paper follows on from a main case study piloted in a cancer center. It proposes an analysis pipeline for patients with osteosarcoma through segmentation, feature extraction and application of a deep learning model to predict response to treatment. The main aim of the AWESOMME project is to leverage this work and implement the pipeline on an easy-to-access, secure web platform. The proposed WEB application is based on a three-component architecture: a data server, a heavy computation and authentication server and a medical imaging web-framework with a user interface. These existing components have been enhanced to meet the needs of security and traceability for the continuous production of expert data. It innovates by covering all steps of medical imaging processing (visualization and segmentation, feature extraction and aided diagnostic) and enables the test and use of machine learning models. The infrastructure is operational, deployed in internal production and is currently being installed in the hospital environment. The extension of the case study and user feedback enabled us to fine-tune functionalities and proved that AWESOMME is a modular solution capable to analyze medical data and share research algorithms with in-house clinicians.</p>","PeriodicalId":50214,"journal":{"name":"Journal of Digital Imaging","volume":"30 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140840107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eva Milara, Adolfo Gómez-Grande, Pilar Sarandeses, Alexander P. Seiffert, Enrique J. Gómez, Patricia Sánchez-González
{"title":"Automatic Skeleton Segmentation in CT Images Based on U-Net","authors":"Eva Milara, Adolfo Gómez-Grande, Pilar Sarandeses, Alexander P. Seiffert, Enrique J. Gómez, Patricia Sánchez-González","doi":"10.1007/s10278-024-01127-5","DOIUrl":"https://doi.org/10.1007/s10278-024-01127-5","url":null,"abstract":"<p>Bone metastasis, emerging oncological therapies, and osteoporosis represent some of the distinct clinical contexts which can result in morphological alterations in bone structure. The visual assessment of these changes through anatomical images is considered suboptimal, emphasizing the importance of precise skeletal segmentation as a valuable aid for its evaluation. In the present study, a neural network model for automatic skeleton segmentation from bidimensional computerized tomography (CT) slices is proposed. A total of 77 CT images and their semimanual skeleton segmentation from two acquisition protocols (whole-body and femur-to-head) are used to form a training group and a testing group. Preprocessing of the images includes four main steps: stretcher removal, thresholding, image clipping, and normalization (with two different techniques: interpatient and intrapatient). Subsequently, five different sets are created and arranged in a randomized order for the training phase. A neural network model based on U-Net architecture is implemented with different values of the number of channels in each feature map and number of epochs. The model with the best performance obtains a Jaccard index (IoU) of 0.959 and a Dice index of 0.979. The resultant model demonstrates the potential of deep learning applied in medical images and proving its utility in bone segmentation.</p>","PeriodicalId":50214,"journal":{"name":"Journal of Digital Imaging","volume":"13 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140840354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Letter to the Editor Regarding Article “Prior to Initiation of Chemotherapy, Can We Predict Breast Tumor Response? Deep Learning Convolutional Neural Networks Approach Using a Breast MRI Tumor Dataset”","authors":"Joren Brunekreef","doi":"10.1007/s10278-024-01129-3","DOIUrl":"https://doi.org/10.1007/s10278-024-01129-3","url":null,"abstract":"<p>The cited article reports on a convolutional neural network trained to predict response to neoadjuvant chemotherapy from pre-treatment breast MRI scans. The proposed algorithm attains impressive performance on the test dataset with a mean Area Under the Receiver-Operating Characteristic curve of 0.98 and a mean accuracy of 88%. In this letter, I raise concerns that the reported results can be explained by inadvertent data leakage between training and test datasets. More precisely, I conjecture that the random split of the full dataset in training and test sets did not occur on a patient level, but rather on the level of 2D MRI slices. This allows the neural network to “memorize” a patient’s anatomy and their treatment outcome, as opposed to discovering useful features for treatment response prediction. To provide evidence for these claims, I present results of similar experiments I conducted on a public breast MRI dataset, where I demonstrate that the suspected data leakage mechanism closely reproduces the results reported on in the cited work.</p>","PeriodicalId":50214,"journal":{"name":"Journal of Digital Imaging","volume":"23 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140840363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yassine Oukdach, Anass Garbaz, Zakaria Kerkaou, Mohamed El Ansari, Lahcen Koutti, Ahmed Fouad El Ouafdi, Mouna Salihoun
{"title":"UViT-Seg: An Efficient ViT and U-Net-Based Framework for Accurate Colorectal Polyp Segmentation in Colonoscopy and WCE Images","authors":"Yassine Oukdach, Anass Garbaz, Zakaria Kerkaou, Mohamed El Ansari, Lahcen Koutti, Ahmed Fouad El Ouafdi, Mouna Salihoun","doi":"10.1007/s10278-024-01124-8","DOIUrl":"https://doi.org/10.1007/s10278-024-01124-8","url":null,"abstract":"<p>Colorectal cancer (CRC) stands out as one of the most prevalent global cancers. The accurate localization of colorectal polyps in endoscopy images is pivotal for timely detection and removal, contributing significantly to CRC prevention. The manual analysis of images generated by gastrointestinal screening technologies poses a tedious task for doctors. Therefore, computer vision-assisted cancer detection could serve as an efficient tool for polyp segmentation. Numerous efforts have been dedicated to automating polyp localization, with the majority of studies relying on convolutional neural networks (CNNs) to learn features from polyp images. Despite their success in polyp segmentation tasks, CNNs exhibit significant limitations in precisely determining polyp location and shape due to their sole reliance on learning local features from images. While gastrointestinal images manifest significant variation in their features, encompassing both high- and low-level ones, a framework that combines the ability to learn both features of polyps is desired. This paper introduces UViT-Seg, a framework designed for polyp segmentation in gastrointestinal images. Operating on an encoder-decoder architecture, UViT-Seg employs two distinct feature extraction methods. A vision transformer in the encoder section captures long-range semantic information, while a CNN module, integrating squeeze-excitation and dual attention mechanisms, captures low-level features, focusing on critical image regions. Experimental evaluations conducted on five public datasets, including CVC clinic, ColonDB, Kvasir-SEG, ETIS LaribDB, and Kvasir Capsule-SEG, demonstrate UViT-Seg’s effectiveness in polyp localization. To confirm its generalization performance, the model is tested on datasets not used in training. Benchmarking against common segmentation methods and state-of-the-art polyp segmentation approaches, the proposed model yields promising results. For instance, it achieves a mean Dice coefficient of 0.915 and a mean intersection over union of 0.902 on the CVC Colon dataset. Furthermore, UViT-Seg has the advantage of being efficient, requiring fewer computational resources for both training and testing. This feature positions it as an optimal choice for real-world deployment scenarios.</p>","PeriodicalId":50214,"journal":{"name":"Journal of Digital Imaging","volume":"43 1","pages":""},"PeriodicalIF":4.4,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140799442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}