{"title":"视觉变换器用于肺部疾病分类的研究","authors":"M. Nguyen, Khai Ngo Quang","doi":"10.1109/GTSD54989.2022.9989100","DOIUrl":null,"url":null,"abstract":"Transformer models have gained much success in natural language processing. In the computer vision field, transformer-based backbones recently compete with CNN-based backbones in many tasks. The success of transformer-based backbones relies on a pre-trained model that is trained on huge datasets. However, the requirement may not be satisfied in medical image applications. Compared to ImageNet 21K dataset, medical image datasets are very limited. Therefore, in this paper, we discover the performance of the Vision Transformer on medical image classification. The vision transformer will be fine-tuned on well-known medical datasets. Later, it will be fine-tuned again on the VinDr-CXR dataset. Comprehensive experiments show that the proposed method is slightly better than conventional convolution-based methods in terms of classification accuracy. However, in terms of model interpretability, ViT based models can handle the co-occurrence of multi-diseases in a medical image.","PeriodicalId":125445,"journal":{"name":"2022 6th International Conference on Green Technology and Sustainable Development (GTSD)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Study of Vision Transformer for Lung Diseases Classification\",\"authors\":\"M. Nguyen, Khai Ngo Quang\",\"doi\":\"10.1109/GTSD54989.2022.9989100\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transformer models have gained much success in natural language processing. In the computer vision field, transformer-based backbones recently compete with CNN-based backbones in many tasks. The success of transformer-based backbones relies on a pre-trained model that is trained on huge datasets. However, the requirement may not be satisfied in medical image applications. Compared to ImageNet 21K dataset, medical image datasets are very limited. Therefore, in this paper, we discover the performance of the Vision Transformer on medical image classification. The vision transformer will be fine-tuned on well-known medical datasets. Later, it will be fine-tuned again on the VinDr-CXR dataset. Comprehensive experiments show that the proposed method is slightly better than conventional convolution-based methods in terms of classification accuracy. However, in terms of model interpretability, ViT based models can handle the co-occurrence of multi-diseases in a medical image.\",\"PeriodicalId\":125445,\"journal\":{\"name\":\"2022 6th International Conference on Green Technology and Sustainable Development (GTSD)\",\"volume\":\"83 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 6th International Conference on Green Technology and Sustainable Development (GTSD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GTSD54989.2022.9989100\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th International Conference on Green Technology and Sustainable Development (GTSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GTSD54989.2022.9989100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Study of Vision Transformer for Lung Diseases Classification
Transformer models have gained much success in natural language processing. In the computer vision field, transformer-based backbones recently compete with CNN-based backbones in many tasks. The success of transformer-based backbones relies on a pre-trained model that is trained on huge datasets. However, the requirement may not be satisfied in medical image applications. Compared to ImageNet 21K dataset, medical image datasets are very limited. Therefore, in this paper, we discover the performance of the Vision Transformer on medical image classification. The vision transformer will be fine-tuned on well-known medical datasets. Later, it will be fine-tuned again on the VinDr-CXR dataset. Comprehensive experiments show that the proposed method is slightly better than conventional convolution-based methods in terms of classification accuracy. However, in terms of model interpretability, ViT based models can handle the co-occurrence of multi-diseases in a medical image.