Hassan Ezzeddine, M. Awad, Alain S. Abi Ghanem, Bassem Mourani
{"title":"On Data Bias and the Usability of Deep Learning Algorithms in Classifying COVID-19 based on Chest X-ray","authors":"Hassan Ezzeddine, M. Awad, Alain S. Abi Ghanem, Bassem Mourani","doi":"10.1109/imcet53404.2021.9665574","DOIUrl":null,"url":null,"abstract":"SARS-COV-2 is a new strain of virus that was first detected in China. It quickly spread across the world affecting millions of people. For this reason, early detection of the virus is mandatory in order to limit the spread of the virus. Real-time reverse transcription polymerase chain reaction (RT-PCR) and the antibody test are the main tests used to detect the virus. Chest X-rays (CXRs) and computerized tomography (CT) scans are also used to detect the virus although the American college of Radiology does not recommend using medical imaging as a diagnostic tool. Like other medical imaging, convolutional neural networks are used to classify the images. We believe that developing a model to detect COVID-19 has no clinical value regardless of the accuracy achieved since 58% of CXRs seem to be normal. During literature review, several papers with suspicious accuracy of 90% and higher were found. We believe that the dataset used to train and validate the network is biased and is not appropriate for deep learning as any model we train using the same dataset has achieved high accuracy. Our experiments on Cohen's Covid dataset, augmented with Wang dataset, shows that any model trained on Cohen dataset can easily achieve high accuracy. This was further validated with two experienced radiologists who participated in this study were only able to classify 60% as being Covid. Our study highlight the importance of addressing bias in data and developing trustworthy and explainable ML models based on well curated data.","PeriodicalId":181607,"journal":{"name":"2021 IEEE 3rd International Multidisciplinary Conference on Engineering Technology (IMCET)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 3rd International Multidisciplinary Conference on Engineering Technology (IMCET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/imcet53404.2021.9665574","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
SARS-COV-2 is a new strain of virus that was first detected in China. It quickly spread across the world affecting millions of people. For this reason, early detection of the virus is mandatory in order to limit the spread of the virus. Real-time reverse transcription polymerase chain reaction (RT-PCR) and the antibody test are the main tests used to detect the virus. Chest X-rays (CXRs) and computerized tomography (CT) scans are also used to detect the virus although the American college of Radiology does not recommend using medical imaging as a diagnostic tool. Like other medical imaging, convolutional neural networks are used to classify the images. We believe that developing a model to detect COVID-19 has no clinical value regardless of the accuracy achieved since 58% of CXRs seem to be normal. During literature review, several papers with suspicious accuracy of 90% and higher were found. We believe that the dataset used to train and validate the network is biased and is not appropriate for deep learning as any model we train using the same dataset has achieved high accuracy. Our experiments on Cohen's Covid dataset, augmented with Wang dataset, shows that any model trained on Cohen dataset can easily achieve high accuracy. This was further validated with two experienced radiologists who participated in this study were only able to classify 60% as being Covid. Our study highlight the importance of addressing bias in data and developing trustworthy and explainable ML models based on well curated data.