{"title":"Multitask Swin Transformer for classification and characterization of pulmonary nodules in CT images.","authors":"Haizhe Jin, Cheng Yu, Jiahao Zhang, Renjie Zheng, Yongyan Fu, Yinan Zhao","doi":"10.21037/qims-24-1619","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Early diagnosis of pulmonary nodules is essential for effective prevention and treatment of pulmonary cancer. However, the heterogeneous and complex characteristics of pulmonary nodules, such as shape, size, speculation, and texture, present significant challenges in clinical diagnosis, which computer-aided diagnosis (CAD) can help address. Moreover, the varied performance of deep learning methods in CAD and limited model interpretability often hinder clinicians' understanding of CAD results. In this study, we propose a multitask Swin Transformer (MTST) for classifying benign and malignant pulmonary nodules, which outputs nodule features as classification criteria.</p><p><strong>Methods: </strong>We introduce a MTST model for feature extraction, designed with a multitask layer that simultaneously outputs benign and malignant binary classification, multilevel classification, and a detailed analysis of pulmonary nodule features. In addition, we incorporate image augmentation using a U-Net generative adversarial network (GAN) model to enhance the training process.</p><p><strong>Results: </strong>Experimental findings on the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) dataset demonstrate that the proposed MTST outperforms conventional convolutional neural networks (CNN)-based networks across multiple tasks. Specifically, MTST achieved an accuracy of 93.24% in binary classification of benign and malignant nodules and demonstrated superior performance in nodule feature evaluation. For multilevel classification of pulmonary nodules, the Swin Transformer achieved an accuracy of 95.73%. On the training, validation, and test sets (9,600/2,400/1,600 nodules), the MTST model achieved an accuracy of 93.74%, sensitivity of 91.55%, and specificity of 96.09%. The results indicate that the MTST model aligns well with clinical diagnostic practices, offering improved performance and reliability.</p><p><strong>Conclusions: </strong>The MTST model's efficacy in binary classification, multiclass classification, and feature evaluation confirms its potential as a valuable tool for CAD systems in clinical settings.</p>","PeriodicalId":54267,"journal":{"name":"Quantitative Imaging in Medicine and Surgery","volume":"15 3","pages":"1845-1861"},"PeriodicalIF":2.9000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11948416/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Quantitative Imaging in Medicine and Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.21037/qims-24-1619","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/26 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Early diagnosis of pulmonary nodules is essential for effective prevention and treatment of pulmonary cancer. However, the heterogeneous and complex characteristics of pulmonary nodules, such as shape, size, speculation, and texture, present significant challenges in clinical diagnosis, which computer-aided diagnosis (CAD) can help address. Moreover, the varied performance of deep learning methods in CAD and limited model interpretability often hinder clinicians' understanding of CAD results. In this study, we propose a multitask Swin Transformer (MTST) for classifying benign and malignant pulmonary nodules, which outputs nodule features as classification criteria.
Methods: We introduce a MTST model for feature extraction, designed with a multitask layer that simultaneously outputs benign and malignant binary classification, multilevel classification, and a detailed analysis of pulmonary nodule features. In addition, we incorporate image augmentation using a U-Net generative adversarial network (GAN) model to enhance the training process.
Results: Experimental findings on the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) dataset demonstrate that the proposed MTST outperforms conventional convolutional neural networks (CNN)-based networks across multiple tasks. Specifically, MTST achieved an accuracy of 93.24% in binary classification of benign and malignant nodules and demonstrated superior performance in nodule feature evaluation. For multilevel classification of pulmonary nodules, the Swin Transformer achieved an accuracy of 95.73%. On the training, validation, and test sets (9,600/2,400/1,600 nodules), the MTST model achieved an accuracy of 93.74%, sensitivity of 91.55%, and specificity of 96.09%. The results indicate that the MTST model aligns well with clinical diagnostic practices, offering improved performance and reliability.
Conclusions: The MTST model's efficacy in binary classification, multiclass classification, and feature evaluation confirms its potential as a valuable tool for CAD systems in clinical settings.