Wenfeng Huang , Xiangyun Liao , Hao Chen , Ying Hu , Wenjing Jia , Qiong Wang
{"title":"用于医学图像超分辨率的局部到全局深度特征学习","authors":"Wenfeng Huang , Xiangyun Liao , Hao Chen , Ying Hu , Wenjing Jia , Qiong Wang","doi":"10.1016/j.compmedimag.2024.102374","DOIUrl":null,"url":null,"abstract":"<div><p>Medical images play a vital role in medical analysis by providing crucial information about patients’ pathological conditions. However, the quality of these images can be compromised by many factors, such as limited resolution of the instruments, artifacts caused by movements, and the complexity of the scanned areas. As a result, low-resolution (LR) images cannot provide sufficient information for diagnosis. To address this issue, researchers have attempted to apply image super-resolution (SR) techniques to restore the high-resolution (HR) images from their LR counterparts. However, these techniques are designed for generic images, and thus suffer from many challenges unique to medical images. An obvious one is the diversity of the scanned objects; for example, the organs, tissues, and vessels typically appear in different sizes and shapes, and are thus hard to restore with standard convolution neural networks (CNNs). In this paper, we develop a dynamic-local learning framework to capture the details of these diverse areas, consisting of deformable convolutions with adjustable kernel shapes. Moreover, the global information between the tissues and organs is vital for medical diagnosis. To preserve global information, we propose pixel–pixel and patch–patch global learning using a non-local mechanism and a vision transformer (ViT), respectively. The result is a novel CNN-ViT neural network with Local-to-Global feature learning for medical image SR, referred to as LGSR, which can accurately restore both local details and global information. We evaluate our method on six public datasets and one large-scale private dataset, which include five different types of medical images (<em>i.e.</em>, Ultrasound, OCT, Endoscope, CT, and MRI images). Experiments show that the proposed method achieves superior PSNR/SSIM and visual performance than the state of the arts with competitive computational costs, measured in network parameters, runtime, and FLOPs. What is more, the experiment conducted on OCT image segmentation for the downstream task demonstrates a significantly positive performance effect of LGSR.</p></div>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"115 ","pages":"Article 102374"},"PeriodicalIF":5.4000,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep local-to-global feature learning for medical image super-resolution\",\"authors\":\"Wenfeng Huang , Xiangyun Liao , Hao Chen , Ying Hu , Wenjing Jia , Qiong Wang\",\"doi\":\"10.1016/j.compmedimag.2024.102374\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Medical images play a vital role in medical analysis by providing crucial information about patients’ pathological conditions. However, the quality of these images can be compromised by many factors, such as limited resolution of the instruments, artifacts caused by movements, and the complexity of the scanned areas. As a result, low-resolution (LR) images cannot provide sufficient information for diagnosis. To address this issue, researchers have attempted to apply image super-resolution (SR) techniques to restore the high-resolution (HR) images from their LR counterparts. However, these techniques are designed for generic images, and thus suffer from many challenges unique to medical images. An obvious one is the diversity of the scanned objects; for example, the organs, tissues, and vessels typically appear in different sizes and shapes, and are thus hard to restore with standard convolution neural networks (CNNs). In this paper, we develop a dynamic-local learning framework to capture the details of these diverse areas, consisting of deformable convolutions with adjustable kernel shapes. Moreover, the global information between the tissues and organs is vital for medical diagnosis. To preserve global information, we propose pixel–pixel and patch–patch global learning using a non-local mechanism and a vision transformer (ViT), respectively. The result is a novel CNN-ViT neural network with Local-to-Global feature learning for medical image SR, referred to as LGSR, which can accurately restore both local details and global information. We evaluate our method on six public datasets and one large-scale private dataset, which include five different types of medical images (<em>i.e.</em>, Ultrasound, OCT, Endoscope, CT, and MRI images). Experiments show that the proposed method achieves superior PSNR/SSIM and visual performance than the state of the arts with competitive computational costs, measured in network parameters, runtime, and FLOPs. What is more, the experiment conducted on OCT image segmentation for the downstream task demonstrates a significantly positive performance effect of LGSR.</p></div>\",\"PeriodicalId\":50631,\"journal\":{\"name\":\"Computerized Medical Imaging and Graphics\",\"volume\":\"115 \",\"pages\":\"Article 102374\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2024-03-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computerized Medical Imaging and Graphics\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S089561112400051X\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computerized Medical Imaging and Graphics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S089561112400051X","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
摘要
医学影像在医学分析中起着至关重要的作用,能提供有关病人病理状况的关键信息。然而,这些图像的质量可能会受到许多因素的影响,例如仪器的分辨率有限、移动造成的伪影以及扫描区域的复杂性。因此,低分辨率(LR)图像无法为诊断提供足够的信息。为了解决这个问题,研究人员尝试应用图像超分辨率(SR)技术来还原低分辨率图像中的高分辨率(HR)图像。然而,这些技术都是针对普通图像设计的,因此存在许多医学图像特有的难题。其中一个明显的挑战是扫描对象的多样性,例如,器官、组织和血管通常具有不同的尺寸和形状,因此很难通过标准卷积神经网络(CNN)进行还原。在本文中,我们开发了一种动态局部学习框架来捕捉这些不同区域的细节,该框架由具有可调内核形状的可变形卷积组成。此外,组织和器官之间的全局信息对于医学诊断至关重要。为了保留全局信息,我们分别使用非局部机制和视觉变换器(ViT)提出了像素-像素和斑块-斑块全局学习。其结果是一种新颖的 CNN-ViT 神经网络,具有用于医学影像 SR 的从局部到全局的特征学习功能,简称为 LGSR,它能准确还原局部细节和全局信息。我们在六个公共数据集和一个大规模私有数据集上评估了我们的方法,这些数据集包括五种不同类型的医学图像(即超声波、OCT、内窥镜、CT 和 MRI 图像)。实验表明,所提方法的 PSNR/SSIM 和视觉效果优于目前的技术水平,而计算成本(以网络参数、运行时间和 FLOPs 计)却很有竞争力。更重要的是,针对下游任务的 OCT 图像分割实验表明,LGSR 的性能效果显著。
Deep local-to-global feature learning for medical image super-resolution
Medical images play a vital role in medical analysis by providing crucial information about patients’ pathological conditions. However, the quality of these images can be compromised by many factors, such as limited resolution of the instruments, artifacts caused by movements, and the complexity of the scanned areas. As a result, low-resolution (LR) images cannot provide sufficient information for diagnosis. To address this issue, researchers have attempted to apply image super-resolution (SR) techniques to restore the high-resolution (HR) images from their LR counterparts. However, these techniques are designed for generic images, and thus suffer from many challenges unique to medical images. An obvious one is the diversity of the scanned objects; for example, the organs, tissues, and vessels typically appear in different sizes and shapes, and are thus hard to restore with standard convolution neural networks (CNNs). In this paper, we develop a dynamic-local learning framework to capture the details of these diverse areas, consisting of deformable convolutions with adjustable kernel shapes. Moreover, the global information between the tissues and organs is vital for medical diagnosis. To preserve global information, we propose pixel–pixel and patch–patch global learning using a non-local mechanism and a vision transformer (ViT), respectively. The result is a novel CNN-ViT neural network with Local-to-Global feature learning for medical image SR, referred to as LGSR, which can accurately restore both local details and global information. We evaluate our method on six public datasets and one large-scale private dataset, which include five different types of medical images (i.e., Ultrasound, OCT, Endoscope, CT, and MRI images). Experiments show that the proposed method achieves superior PSNR/SSIM and visual performance than the state of the arts with competitive computational costs, measured in network parameters, runtime, and FLOPs. What is more, the experiment conducted on OCT image segmentation for the downstream task demonstrates a significantly positive performance effect of LGSR.
期刊介绍:
The purpose of the journal Computerized Medical Imaging and Graphics is to act as a source for the exchange of research results concerning algorithmic advances, development, and application of digital imaging in disease detection, diagnosis, intervention, prevention, precision medicine, and population health. Included in the journal will be articles on novel computerized imaging or visualization techniques, including artificial intelligence and machine learning, augmented reality for surgical planning and guidance, big biomedical data visualization, computer-aided diagnosis, computerized-robotic surgery, image-guided therapy, imaging scanning and reconstruction, mobile and tele-imaging, radiomics, and imaging integration and modeling with other information relevant to digital health. The types of biomedical imaging include: magnetic resonance, computed tomography, ultrasound, nuclear medicine, X-ray, microwave, optical and multi-photon microscopy, video and sensory imaging, and the convergence of biomedical images with other non-imaging datasets.