Pure Vision Transformer (CT-ViT) with Noise2Neighbors Interpolation for Low-Dose CT Image Denoising

IF 2.9 2区工程技术 Q2 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Journal of Digital Imaging Pub Date : 2024-04-15 DOI:10.1007/s10278-024-01108-8

Luella Marcos, Paul Babyn, Javad Alirezaie

{"title":"Pure Vision Transformer (CT-ViT) with Noise2Neighbors Interpolation for Low-Dose CT Image Denoising","authors":"Luella Marcos, Paul Babyn, Javad Alirezaie","doi":"10.1007/s10278-024-01108-8","DOIUrl":null,"url":null,"abstract":"<p>Convolutional neural networks (CNN) have been used for a wide variety of deep learning applications, especially in computer vision. For medical image processing, researchers have identified certain challenges associated with CNNs. These challenges encompass the generation of less informative features, limitations in capturing both high and low-frequency information within feature maps, and the computational cost incurred when enhancing receptive fields by deepening the network. Transformers have emerged as an approach aiming to address and overcome these specific limitations of CNNs in the context of medical image analysis. Preservation of all spatial details of medical images is necessary to ensure accurate patient diagnosis. Hence, this research introduced the use of a pure Vision Transformer (ViT) for a denoising artificial neural network for medical image processing specifically for low-dose computed tomography (LDCT) image denoising. The proposed model follows a U-Net framework that contains ViT modules with the integration of Noise2Neighbor (N2N) interpolation operation. Five different datasets containing LDCT and normal-dose CT (NDCT) image pairs were used to carry out this experiment. To test the efficacy of the proposed model, this experiment includes comparisons between the quantitative and visual results among CNN-based (BM3D, RED-CNN, DRL-E-MP), hybrid CNN-ViT-based (TED-Net), and the proposed pure ViT-based denoising model. The findings of this study showed that there is about 15–20% increase in SSIM and PSNR when using self-attention transformers than using the typical pure CNN. Visual results also showed improvements especially when it comes to showing fine structural details of CT images.</p>","PeriodicalId":50214,"journal":{"name":"Journal of Digital Imaging","volume":"2 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Digital Imaging","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s10278-024-01108-8","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Convolutional neural networks (CNN) have been used for a wide variety of deep learning applications, especially in computer vision. For medical image processing, researchers have identified certain challenges associated with CNNs. These challenges encompass the generation of less informative features, limitations in capturing both high and low-frequency information within feature maps, and the computational cost incurred when enhancing receptive fields by deepening the network. Transformers have emerged as an approach aiming to address and overcome these specific limitations of CNNs in the context of medical image analysis. Preservation of all spatial details of medical images is necessary to ensure accurate patient diagnosis. Hence, this research introduced the use of a pure Vision Transformer (ViT) for a denoising artificial neural network for medical image processing specifically for low-dose computed tomography (LDCT) image denoising. The proposed model follows a U-Net framework that contains ViT modules with the integration of Noise2Neighbor (N2N) interpolation operation. Five different datasets containing LDCT and normal-dose CT (NDCT) image pairs were used to carry out this experiment. To test the efficacy of the proposed model, this experiment includes comparisons between the quantitative and visual results among CNN-based (BM3D, RED-CNN, DRL-E-MP), hybrid CNN-ViT-based (TED-Net), and the proposed pure ViT-based denoising model. The findings of this study showed that there is about 15–20% increase in SSIM and PSNR when using self-attention transformers than using the typical pure CNN. Visual results also showed improvements especially when it comes to showing fine structural details of CT images.

Abstract Image

查看原文本刊更多论文

采用 Noise2Neighbors 插值技术的纯视觉变换器（CT-ViT）用于低剂量 CT 图像去噪

卷积神经网络（CNN）已被广泛用于各种深度学习应用，尤其是计算机视觉领域。在医学图像处理方面，研究人员发现了与卷积神经网络相关的某些挑战。这些挑战包括生成信息量较少的特征、在特征图中捕捉高频和低频信息的局限性，以及通过加深网络来增强感受野所产生的计算成本。在医学图像分析中，变形器作为一种方法应运而生，旨在解决和克服 CNN 的这些特定局限性。要确保准确诊断病人，就必须保留医学图像的所有空间细节。因此，本研究将纯视觉变换器（ViT）用于医学图像处理的去噪人工神经网络，特别是用于低剂量计算机断层扫描（LDCT）图像的去噪。所提出的模型采用 U-Net 框架，其中包含 ViT 模块，并集成了 Noise2Neighbor（N2N）插值操作。实验使用了五个不同的数据集，其中包含 LDCT 和正常剂量 CT（NDCT）图像对。为了测试所提模型的功效，本实验比较了基于 CNN（BM3D、RED-CNN、DRL-E-MP）、基于 CNN-ViT 混合（TED-Net）和所提纯 ViT 去噪模型的定量和视觉结果。研究结果表明，使用自注意变换器时，SSIM 和 PSNR 比使用典型的纯 CNN 时提高了约 15-20%。视觉结果也显示，尤其是在显示 CT 图像的精细结构细节方面，效果有所改善。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Digital Imaging 医学-核医学

CiteScore

7.50

自引率

6.80%

发文量

192

审稿时长

6-12 weeks

期刊介绍： The Journal of Digital Imaging (JDI) is the official peer-reviewed journal of the Society for Imaging Informatics in Medicine (SIIM). JDI’s goal is to enhance the exchange of knowledge encompassed by the general topic of Imaging Informatics in Medicine such as research and practice in clinical, engineering, and information technologies and techniques in all medical imaging environments. JDI topics are of interest to researchers, developers, educators, physicians, and imaging informatics professionals. Suggested Topics PACS and component systems; imaging informatics for the enterprise; image-enabled electronic medical records; RIS and HIS; digital image acquisition; image processing; image data compression; 3D, visualization, and multimedia; speech recognition; computer-aided diagnosis; facilities design; imaging vocabularies and ontologies; Transforming the Radiological Interpretation Process (TRIP™); DICOM and other standards; workflow and process modeling and simulation; quality assurance; archive integrity and security; teleradiology; digital mammography; and radiological informatics education.