A novel UNet-SegNet and vision transformer architectures for efficient segmentation and classification in medical imaging.

IF 2 4区医学 Q3 ENGINEERING, BIOMEDICAL

Physical and Engineering Sciences in Medicine Pub Date : 2025-09-01 Epub Date: 2025-07-08 DOI:10.1007/s13246-025-01564-8

Simon Tongbram, Benjamin A Shimray, Loitongbam Surajkumar Singh

{"title":"A novel UNet-SegNet and vision transformer architectures for efficient segmentation and classification in medical imaging.","authors":"Simon Tongbram, Benjamin A Shimray, Loitongbam Surajkumar Singh","doi":"10.1007/s13246-025-01564-8","DOIUrl":null,"url":null,"abstract":"<p><p>Medical imaging has become an essential tool in the diagnosis and treatment of various diseases, and provides critical insights through ultrasound, MRI, and X-ray modalities. Despite its importance, challenges remain in the accurate segmentation and classification of complex structures owing to factors such as low contrast, noise, and irregular anatomical shapes. This study addresses these challenges by proposing a novel hybrid deep learning model that integrates the strengths of Convolutional Autoencoders (CAE), UNet, and SegNet architectures. In the preprocessing phase, a Convolutional Autoencoder is used to effectively reduce noise while preserving essential image details, ensuring that the images used for segmentation and classification are of high quality. The ability of CAE to denoise images while retaining critical features enhances the accuracy of the subsequent analysis. The developed model employs UNet for multiscale feature extraction and SegNet for precise boundary reconstruction, with Dynamic Feature Fusion integrated at each skip connection to dynamically weight and combine the feature maps from the encoder and decoder. This ensures that both global and local features are effectively captured, while emphasizing the critical regions for segmentation. To further enhance the model's performance, the Hybrid Emperor Penguin Optimizer (HEPO) was employed for feature selection, while the Hybrid Vision Transformer with Convolutional Embedding (HyViT-CE) was used for the classification task. This hybrid approach allows the model to maintain high accuracy across different medical imaging tasks. The model was evaluated using three major datasets: brain tumor MRI, breast ultrasound, and chest X-rays. The results demonstrate exceptional performance, achieving an accuracy of 99.92% for brain tumor segmentation, 99.67% for breast cancer detection, and 99.93% for chest X-ray classification. These outcomes highlight the ability of the model to deliver reliable and accurate diagnostics across various medical contexts, underscoring its potential as a valuable tool in clinical settings. The findings of this study will contribute to advancing deep learning applications in medical imaging, addressing existing research gaps, and offering a robust solution for improved patient care.</p>","PeriodicalId":48490,"journal":{"name":"Physical and Engineering Sciences in Medicine","volume":" ","pages":"1023-1055"},"PeriodicalIF":2.0000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physical and Engineering Sciences in Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s13246-025-01564-8","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/8 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Medical imaging has become an essential tool in the diagnosis and treatment of various diseases, and provides critical insights through ultrasound, MRI, and X-ray modalities. Despite its importance, challenges remain in the accurate segmentation and classification of complex structures owing to factors such as low contrast, noise, and irregular anatomical shapes. This study addresses these challenges by proposing a novel hybrid deep learning model that integrates the strengths of Convolutional Autoencoders (CAE), UNet, and SegNet architectures. In the preprocessing phase, a Convolutional Autoencoder is used to effectively reduce noise while preserving essential image details, ensuring that the images used for segmentation and classification are of high quality. The ability of CAE to denoise images while retaining critical features enhances the accuracy of the subsequent analysis. The developed model employs UNet for multiscale feature extraction and SegNet for precise boundary reconstruction, with Dynamic Feature Fusion integrated at each skip connection to dynamically weight and combine the feature maps from the encoder and decoder. This ensures that both global and local features are effectively captured, while emphasizing the critical regions for segmentation. To further enhance the model's performance, the Hybrid Emperor Penguin Optimizer (HEPO) was employed for feature selection, while the Hybrid Vision Transformer with Convolutional Embedding (HyViT-CE) was used for the classification task. This hybrid approach allows the model to maintain high accuracy across different medical imaging tasks. The model was evaluated using three major datasets: brain tumor MRI, breast ultrasound, and chest X-rays. The results demonstrate exceptional performance, achieving an accuracy of 99.92% for brain tumor segmentation, 99.67% for breast cancer detection, and 99.93% for chest X-ray classification. These outcomes highlight the ability of the model to deliver reliable and accurate diagnostics across various medical contexts, underscoring its potential as a valuable tool in clinical settings. The findings of this study will contribute to advancing deep learning applications in medical imaging, addressing existing research gaps, and offering a robust solution for improved patient care.

查看原文本刊更多论文

一种新的UNet-SegNet和视觉转换器架构，用于医学成像中有效的分割和分类。

医学成像已经成为诊断和治疗各种疾病的重要工具，并通过超声、MRI和x射线模式提供重要的见解。尽管它很重要，但由于低对比度、噪声和不规则解剖形状等因素，在复杂结构的准确分割和分类方面仍然存在挑战。本研究通过提出一种新的混合深度学习模型来解决这些挑战，该模型集成了卷积自编码器（CAE）、UNet和SegNet架构的优势。在预处理阶段，使用卷积自编码器在保留图像基本细节的同时有效地降低噪声，确保用于分割和分类的图像具有高质量。CAE在保留关键特征的同时对图像进行降噪的能力增强了后续分析的准确性。该模型采用UNet进行多尺度特征提取，SegNet进行精确边界重建，并在每个跳跃连接处集成动态特征融合（Dynamic feature Fusion），对编码器和解码器的特征映射进行动态加权和组合。这确保了有效地捕获全局和局部特征，同时强调了分割的关键区域。为了进一步提高模型的性能，使用混合帝企鹅优化器（HEPO）进行特征选择，使用混合视觉变压器与卷积嵌入（hyvitce）进行分类任务。这种混合方法允许模型在不同的医学成像任务中保持高精度。该模型使用三个主要数据集进行评估：脑肿瘤MRI，乳房超声和胸部x射线。结果显示出优异的性能，脑肿瘤分割准确率为99.92%，乳腺癌检测准确率为99.67%，胸部x线分类准确率为99.93%。这些结果突出了该模型在各种医疗环境中提供可靠和准确诊断的能力，强调了其作为临床环境中有价值工具的潜力。这项研究的结果将有助于推进深度学习在医学成像中的应用，解决现有的研究差距，并为改善患者护理提供一个强大的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Physical and Engineering Sciences in Medicine Multiple-

CiteScore

8.40

自引率

4.50%

发文量

110