Manas Ranjan Mohanty, Pradeep Kumar Mallick, Annapareddy V N Reddy
{"title":"Optimizing pulmonary chest x-ray classification with stacked feature ensemble and swin transformer integration.","authors":"Manas Ranjan Mohanty, Pradeep Kumar Mallick, Annapareddy V N Reddy","doi":"10.1088/2057-1976/ad8c46","DOIUrl":null,"url":null,"abstract":"<p><p>This research presents an integrated framework designed to automate the classification of pulmonary chest x-ray images. Leveraging convolutional neural networks (CNNs) with a focus on transformer architectures, the aim is to improve both the accuracy and efficiency of pulmonary chest x-ray image analysis. A central aspect of this approach involves utilizing pre-trained networks such as VGG16, ResNet50, and MobileNetV2 to create a feature ensemble. A notable innovation is the adoption of a stacked ensemble technique, which combines outputs from multiple pre-trained models to generate a comprehensive feature representation. In the feature ensemble approach, each image undergoes individual processing through the three pre-trained networks, and pooled images are extracted just before the flatten layer of each model. Consequently, three pooled images in 2D grayscale format are obtained for each original image. These pooled images serve as samples for creating 3D images resembling RGB images through stacking, intended for classifier input in subsequent analysis stages. By incorporating stacked pooling layers to facilitate feature ensemble, a broader range of features is utilized while effectively managing complexities associated with processing the augmented feature pool. Moreover, the study incorporates the Swin Transformer architecture, known for effectively capturing both local and global features. The Swin Transformer architecture is further optimized using the artificial hummingbird algorithm (AHA). By fine-tuning hyperparameters such as patch size, multi-layer perceptron (MLP) ratio, and channel numbers, the AHA optimization technique aims to maximize classification accuracy. The proposed integrated framework, featuring the AHA-optimized Swin Transformer classifier utilizing stacked features, is evaluated using three diverse chest x-ray datasets-VinDr-CXR, PediCXR, and MIMIC-CXR. The observed accuracies of 98.874%, 98.528%, and 98.958% respectively, underscore the robustness and generalizability of the developed model across various clinical scenarios and imaging conditions.</p>","PeriodicalId":8896,"journal":{"name":"Biomedical Physics & Engineering Express","volume":"11 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Physics & Engineering Express","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/2057-1976/ad8c46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
This research presents an integrated framework designed to automate the classification of pulmonary chest x-ray images. Leveraging convolutional neural networks (CNNs) with a focus on transformer architectures, the aim is to improve both the accuracy and efficiency of pulmonary chest x-ray image analysis. A central aspect of this approach involves utilizing pre-trained networks such as VGG16, ResNet50, and MobileNetV2 to create a feature ensemble. A notable innovation is the adoption of a stacked ensemble technique, which combines outputs from multiple pre-trained models to generate a comprehensive feature representation. In the feature ensemble approach, each image undergoes individual processing through the three pre-trained networks, and pooled images are extracted just before the flatten layer of each model. Consequently, three pooled images in 2D grayscale format are obtained for each original image. These pooled images serve as samples for creating 3D images resembling RGB images through stacking, intended for classifier input in subsequent analysis stages. By incorporating stacked pooling layers to facilitate feature ensemble, a broader range of features is utilized while effectively managing complexities associated with processing the augmented feature pool. Moreover, the study incorporates the Swin Transformer architecture, known for effectively capturing both local and global features. The Swin Transformer architecture is further optimized using the artificial hummingbird algorithm (AHA). By fine-tuning hyperparameters such as patch size, multi-layer perceptron (MLP) ratio, and channel numbers, the AHA optimization technique aims to maximize classification accuracy. The proposed integrated framework, featuring the AHA-optimized Swin Transformer classifier utilizing stacked features, is evaluated using three diverse chest x-ray datasets-VinDr-CXR, PediCXR, and MIMIC-CXR. The observed accuracies of 98.874%, 98.528%, and 98.958% respectively, underscore the robustness and generalizability of the developed model across various clinical scenarios and imaging conditions.
期刊介绍:
BPEX is an inclusive, international, multidisciplinary journal devoted to publishing new research on any application of physics and/or engineering in medicine and/or biology. Characterized by a broad geographical coverage and a fast-track peer-review process, relevant topics include all aspects of biophysics, medical physics and biomedical engineering. Papers that are almost entirely clinical or biological in their focus are not suitable. The journal has an emphasis on publishing interdisciplinary work and bringing research fields together, encompassing experimental, theoretical and computational work.