{"title":"LoMAE: Simple Streamlined Low-level Masked Autoencoders for Robust, Generalized, and Interpretable Low-dose CT Denoising.","authors":"Dayang Wang, Shuo Han, Yongshun Xu, Zhan Wu, Li Zhou, Bahareh Morovati, Hengyong Yu","doi":"10.1109/JBHI.2024.3454979","DOIUrl":null,"url":null,"abstract":"<p><p>Low-dose computed tomography (LDCT) offers reduced X-ray radiation exposure but at the cost of compromised image quality, characterized by increased noise and artifacts. Recently, transformer models emerged as a promising avenue to enhance LDCT image quality. However, the success of such models relies on a large amount of paired noisy and clean images, which are often scarce in clinical settings. In computer vision and natural language processing, masked autoencoders (MAE) have been recognized as a powerful self-pretraining method for transformers, due to their exceptional capability to extract representative features. However, the original pretraining and fine-tuning design fails to work in low-level vision tasks like denoising. In response to this challenge, we redesign the classical encoder-decoder learning model and facilitate a simple yet effective streamlined low-level vision MAE, referred to as LoMAE, tailored to address the LDCT denoising problem. Moreover, we introduce an MAE-GradCAM method to shed light on the latent learning mechanisms of the MAE/LoMAE. Additionally, we explore the LoMAE's robustness and generability across a variety of noise levels. Experimental findings show that the proposed LoMAE enhances the denoising capabilities of the transformer and substantially reduce their dependency on high-quality, ground-truth data. It also demonstrates remarkable robustness and generalizability over a spectrum of noise levels. In summary, the proposed LoMAE provides promising solutions to the major issues in LDCT including interpretability, ground truth data dependency, and model robustness/generalizability.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":null,"pages":null},"PeriodicalIF":6.7000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Biomedical and Health Informatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/JBHI.2024.3454979","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Low-dose computed tomography (LDCT) offers reduced X-ray radiation exposure but at the cost of compromised image quality, characterized by increased noise and artifacts. Recently, transformer models emerged as a promising avenue to enhance LDCT image quality. However, the success of such models relies on a large amount of paired noisy and clean images, which are often scarce in clinical settings. In computer vision and natural language processing, masked autoencoders (MAE) have been recognized as a powerful self-pretraining method for transformers, due to their exceptional capability to extract representative features. However, the original pretraining and fine-tuning design fails to work in low-level vision tasks like denoising. In response to this challenge, we redesign the classical encoder-decoder learning model and facilitate a simple yet effective streamlined low-level vision MAE, referred to as LoMAE, tailored to address the LDCT denoising problem. Moreover, we introduce an MAE-GradCAM method to shed light on the latent learning mechanisms of the MAE/LoMAE. Additionally, we explore the LoMAE's robustness and generability across a variety of noise levels. Experimental findings show that the proposed LoMAE enhances the denoising capabilities of the transformer and substantially reduce their dependency on high-quality, ground-truth data. It also demonstrates remarkable robustness and generalizability over a spectrum of noise levels. In summary, the proposed LoMAE provides promising solutions to the major issues in LDCT including interpretability, ground truth data dependency, and model robustness/generalizability.
期刊介绍:
IEEE Journal of Biomedical and Health Informatics publishes original papers presenting recent advances where information and communication technologies intersect with health, healthcare, life sciences, and biomedicine. Topics include acquisition, transmission, storage, retrieval, management, and analysis of biomedical and health information. The journal covers applications of information technologies in healthcare, patient monitoring, preventive care, early disease diagnosis, therapy discovery, and personalized treatment protocols. It explores electronic medical and health records, clinical information systems, decision support systems, medical and biological imaging informatics, wearable systems, body area/sensor networks, and more. Integration-related topics like interoperability, evidence-based medicine, and secure patient data are also addressed.