Muhamad Faisal, J. T. Darmawan, Nabil Bachroin, Cries Avian, Jenq-Shiou Leu, Chia-Ti Tsai
{"title":"CheXViT: CheXNet and Vision Transformer to Multi-Label Chest X-Ray Image Classification","authors":"Muhamad Faisal, J. T. Darmawan, Nabil Bachroin, Cries Avian, Jenq-Shiou Leu, Chia-Ti Tsai","doi":"10.1109/MeMeA57477.2023.10171855","DOIUrl":null,"url":null,"abstract":"The popular technique in assisting radiologist to diagnose clinical thoracic for assessment of abnormalities is Chest X-Ray (CXR) imaging. The current automated system in CXR images relies on convolutional neural network (CNN) models, which focus on local features of images without considering global features. Most of the approaches utilize CNN which excels in generating inductive biases that specifically focus on potential regions of interest within an image. Although CNN models are able to achieve satisfactory performance, it is also a limiting factor to obtaining better performance in CXR classification. Recently, the adaptation of self-attention mechanism in transformer has been introduced to computer vision which enhances the performance of image classification by capturing short and long-range dependencies. Therefore, we propose a hybrid CNN-Transformer classifier for multi-label CXR images called CheXViT, a modification of CheXNet that integrates with the vision transformer (ViT) architecture. CheXNet and its remarkable performance is a perfect well-performing CXR classification model that could generate reliable and definitive feature maps for the ViT to widen the feature scope. The combination would propel the model performance by combining the inductive biases from CNN and long-range feature dependencies from the transformer. In the end, ChestX-Ray14 dataset is selected to evaluate the effectiveness of CheXViT. Our proposed method achieves a mean AUC of 0.838 and is superior to the existing methods.","PeriodicalId":191927,"journal":{"name":"2023 IEEE International Symposium on Medical Measurements and Applications (MeMeA)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Symposium on Medical Measurements and Applications (MeMeA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MeMeA57477.2023.10171855","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The popular technique in assisting radiologist to diagnose clinical thoracic for assessment of abnormalities is Chest X-Ray (CXR) imaging. The current automated system in CXR images relies on convolutional neural network (CNN) models, which focus on local features of images without considering global features. Most of the approaches utilize CNN which excels in generating inductive biases that specifically focus on potential regions of interest within an image. Although CNN models are able to achieve satisfactory performance, it is also a limiting factor to obtaining better performance in CXR classification. Recently, the adaptation of self-attention mechanism in transformer has been introduced to computer vision which enhances the performance of image classification by capturing short and long-range dependencies. Therefore, we propose a hybrid CNN-Transformer classifier for multi-label CXR images called CheXViT, a modification of CheXNet that integrates with the vision transformer (ViT) architecture. CheXNet and its remarkable performance is a perfect well-performing CXR classification model that could generate reliable and definitive feature maps for the ViT to widen the feature scope. The combination would propel the model performance by combining the inductive biases from CNN and long-range feature dependencies from the transformer. In the end, ChestX-Ray14 dataset is selected to evaluate the effectiveness of CheXViT. Our proposed method achieves a mean AUC of 0.838 and is superior to the existing methods.