CheXViT: CheXNet and Vision Transformer to Multi-Label Chest X-Ray Image Classification

2023 IEEE International Symposium on Medical Measurements and Applications (MeMeA) Pub Date : 2023-06-14 DOI:10.1109/MeMeA57477.2023.10171855

Muhamad Faisal, J. T. Darmawan, Nabil Bachroin, Cries Avian, Jenq-Shiou Leu, Chia-Ti Tsai

{"title":"CheXViT: CheXNet and Vision Transformer to Multi-Label Chest X-Ray Image Classification","authors":"Muhamad Faisal, J. T. Darmawan, Nabil Bachroin, Cries Avian, Jenq-Shiou Leu, Chia-Ti Tsai","doi":"10.1109/MeMeA57477.2023.10171855","DOIUrl":null,"url":null,"abstract":"The popular technique in assisting radiologist to diagnose clinical thoracic for assessment of abnormalities is Chest X-Ray (CXR) imaging. The current automated system in CXR images relies on convolutional neural network (CNN) models, which focus on local features of images without considering global features. Most of the approaches utilize CNN which excels in generating inductive biases that specifically focus on potential regions of interest within an image. Although CNN models are able to achieve satisfactory performance, it is also a limiting factor to obtaining better performance in CXR classification. Recently, the adaptation of self-attention mechanism in transformer has been introduced to computer vision which enhances the performance of image classification by capturing short and long-range dependencies. Therefore, we propose a hybrid CNN-Transformer classifier for multi-label CXR images called CheXViT, a modification of CheXNet that integrates with the vision transformer (ViT) architecture. CheXNet and its remarkable performance is a perfect well-performing CXR classification model that could generate reliable and definitive feature maps for the ViT to widen the feature scope. The combination would propel the model performance by combining the inductive biases from CNN and long-range feature dependencies from the transformer. In the end, ChestX-Ray14 dataset is selected to evaluate the effectiveness of CheXViT. Our proposed method achieves a mean AUC of 0.838 and is superior to the existing methods.","PeriodicalId":191927,"journal":{"name":"2023 IEEE International Symposium on Medical Measurements and Applications (MeMeA)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Symposium on Medical Measurements and Applications (MeMeA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MeMeA57477.2023.10171855","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The popular technique in assisting radiologist to diagnose clinical thoracic for assessment of abnormalities is Chest X-Ray (CXR) imaging. The current automated system in CXR images relies on convolutional neural network (CNN) models, which focus on local features of images without considering global features. Most of the approaches utilize CNN which excels in generating inductive biases that specifically focus on potential regions of interest within an image. Although CNN models are able to achieve satisfactory performance, it is also a limiting factor to obtaining better performance in CXR classification. Recently, the adaptation of self-attention mechanism in transformer has been introduced to computer vision which enhances the performance of image classification by capturing short and long-range dependencies. Therefore, we propose a hybrid CNN-Transformer classifier for multi-label CXR images called CheXViT, a modification of CheXNet that integrates with the vision transformer (ViT) architecture. CheXNet and its remarkable performance is a perfect well-performing CXR classification model that could generate reliable and definitive feature maps for the ViT to widen the feature scope. The combination would propel the model performance by combining the inductive biases from CNN and long-range feature dependencies from the transformer. In the end, ChestX-Ray14 dataset is selected to evaluate the effectiveness of CheXViT. Our proposed method achieves a mean AUC of 0.838 and is superior to the existing methods.

查看原文本刊更多论文

CheXViT: CheXNet和视觉转换器的多标签胸部x线图像分类

辅助放射科医生诊断临床胸部异常评估的常用技术是胸部x线(CXR)成像。目前的CXR图像自动化系统依赖于卷积神经网络(CNN)模型，该模型关注图像的局部特征而不考虑全局特征。大多数方法利用CNN，它擅长产生归纳偏差，特别关注图像中感兴趣的潜在区域。虽然CNN模型能够获得令人满意的性能，但这也是限制在CXR分类中获得更好性能的一个因素。近年来，变压器的自关注机制被引入到计算机视觉中，通过捕获短期和长期的依赖关系来提高图像分类的性能。因此，我们提出了一种用于多标签CXR图像的混合CNN-Transformer分类器，称为CheXViT，这是CheXNet的改进，集成了视觉转换器(vision transformer, ViT)架构。CheXNet及其卓越的性能是一个完美的、性能良好的CXR分类模型，可以为ViT生成可靠的、确定的特征映射，从而扩大特征范围。这种组合将通过结合CNN的归纳偏差和变压器的远程特征依赖来提高模型的性能。最后，选择chex - ray14数据集来评估CheXViT的有效性。该方法的平均AUC为0.838，优于现有方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE International Symposium on Medical Measurements and Applications (MeMeA)

自引率

0.00%

发文量