Predicting COVID-19 Cases on a Large Chest X-Ray Dataset Using Modified Pre-trained CNN Architectures

IF 0.5 Q4 COMPUTER SCIENCE, THEORY & METHODS

Applied Computer Systems Pub Date : 2023-06-01 DOI:10.2478/acss-2023-0005

Abdulkadir Karac

{"title":"Predicting COVID-19 Cases on a Large Chest X-Ray Dataset Using Modified Pre-trained CNN Architectures","authors":"Abdulkadir Karac","doi":"10.2478/acss-2023-0005","DOIUrl":null,"url":null,"abstract":"Abstract The Coronavirus is a virus that spreads very quickly. Therefore, it has had very destructive effects in many areas worldwide. Because X-ray images are an easily accessible, fast, and inexpensive method, they are widely used worldwide to diagnose COVID-19. This study tried detecting COVID-19 from X-ray images using pre-trained VGG16, VGG19, InceptionV3, and Resnet50 CNN architectures and modified versions of these architectures. The fully connected layers of the pre-trained architectures have been reorganized in the modified CNN architectures. These architectures were trained on binary and three-class datasets, revealing their classification performance. The data set was collected from four different sources and consisted of 594 COVID-19, 1345 viral pneumonia, and 1341 normal X-ray images. Models are built using Tensorflow and Keras Libraries with Python programming language. Preprocessing was performed on the dataset by applying resizing, normalization, and one hot encoding operation. Model performances were evaluated according to many performance metrics such as recall, specificity, accuracy, precision, F1-score, confusion matrix, ROC analysis, etc., using 5-fold cross-validation. The highest classification performance was obtained in the modified VGG19 model with 99.84 % accuracy for binary classification (COVID-19 vs. Normal) and in the modified VGG16 model with 98.26 % accuracy for triple classification (COVID-19 vs. Pneumonia vs. Normal). These models have a higher accuracy rate than other studies in the literature. In addition, the number of COVID-19 X-ray images in the dataset used in this study is approximately two times higher than in other studies. Since it is obtained from different sources, it is irregular and does not have a standard. Despite this, it is noteworthy that higher classification performance was achieved than in previous studies. Modified VGG16 and VGG19 models (available at github.com/akaraci/LargeDatasetCovid19) can be used as an auxiliary tool in slight healthcare organizations’ shortage of specialists to detect COVID-19.","PeriodicalId":41960,"journal":{"name":"Applied Computer Systems","volume":"10 1","pages":"44 - 57"},"PeriodicalIF":0.5000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/acss-2023-0005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract The Coronavirus is a virus that spreads very quickly. Therefore, it has had very destructive effects in many areas worldwide. Because X-ray images are an easily accessible, fast, and inexpensive method, they are widely used worldwide to diagnose COVID-19. This study tried detecting COVID-19 from X-ray images using pre-trained VGG16, VGG19, InceptionV3, and Resnet50 CNN architectures and modified versions of these architectures. The fully connected layers of the pre-trained architectures have been reorganized in the modified CNN architectures. These architectures were trained on binary and three-class datasets, revealing their classification performance. The data set was collected from four different sources and consisted of 594 COVID-19, 1345 viral pneumonia, and 1341 normal X-ray images. Models are built using Tensorflow and Keras Libraries with Python programming language. Preprocessing was performed on the dataset by applying resizing, normalization, and one hot encoding operation. Model performances were evaluated according to many performance metrics such as recall, specificity, accuracy, precision, F1-score, confusion matrix, ROC analysis, etc., using 5-fold cross-validation. The highest classification performance was obtained in the modified VGG19 model with 99.84 % accuracy for binary classification (COVID-19 vs. Normal) and in the modified VGG16 model with 98.26 % accuracy for triple classification (COVID-19 vs. Pneumonia vs. Normal). These models have a higher accuracy rate than other studies in the literature. In addition, the number of COVID-19 X-ray images in the dataset used in this study is approximately two times higher than in other studies. Since it is obtained from different sources, it is irregular and does not have a standard. Despite this, it is noteworthy that higher classification performance was achieved than in previous studies. Modified VGG16 and VGG19 models (available at github.com/akaraci/LargeDatasetCovid19) can be used as an auxiliary tool in slight healthcare organizations’ shortage of specialists to detect COVID-19.

查看原文本刊更多论文

使用改进的预训练CNN架构在大型胸部x射线数据集上预测COVID-19病例

冠状病毒是一种传播非常迅速的病毒。因此，它在世界上许多地区产生了极具破坏性的影响。由于x射线图像是一种容易获得、快速和廉价的方法，因此在世界范围内广泛用于诊断COVID-19。本研究尝试使用预训练的VGG16、VGG19、InceptionV3和Resnet50 CNN架构以及这些架构的修改版本从x射线图像中检测COVID-19。在修改后的CNN架构中，预训练架构的全连接层被重新组织。这些架构在二分类和三类数据集上进行了训练，揭示了它们的分类性能。数据集来自四个不同的来源，包括594张COVID-19图像，1345张病毒性肺炎图像和1341张正常x线图像。模型使用Tensorflow和Keras库与Python编程语言构建。通过调整大小、规范化和一次热编码操作对数据集进行预处理。采用5倍交叉验证，根据召回率、特异性、准确度、精密度、f1评分、混淆矩阵、ROC分析等多项性能指标对模型性能进行评价。改进的VGG19模型在二元分类(COVID-19 vs. Normal)上的准确率为99.84%，在三重分类(COVID-19 vs.肺炎vs. Normal)上的准确率为98.26%，分类性能最高。与文献中其他研究相比，这些模型具有更高的准确率。此外，本研究中使用的数据集中的COVID-19 x射线图像数量大约是其他研究的两倍。由于它的来源不同，所以它是不规则的，没有标准。尽管如此，值得注意的是，与以往的研究相比，我们取得了更高的分类性能。改进的VGG16和VGG19模型(可在github.com/akaraci/LargeDatasetCovid19上获得)可作为辅助工具，用于轻微医疗机构缺乏检测COVID-19的专家。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Computer Systems COMPUTER SCIENCE, THEORY & METHODS-

自引率

10.00%

发文量

审稿时长

30 weeks