基于胸片的深度学习算法在COVID-19分类中的数据偏差及可用性研究

Hassan Ezzeddine, M. Awad, Alain S. Abi Ghanem, Bassem Mourani
{"title":"基于胸片的深度学习算法在COVID-19分类中的数据偏差及可用性研究","authors":"Hassan Ezzeddine, M. Awad, Alain S. Abi Ghanem, Bassem Mourani","doi":"10.1109/imcet53404.2021.9665574","DOIUrl":null,"url":null,"abstract":"SARS-COV-2 is a new strain of virus that was first detected in China. It quickly spread across the world affecting millions of people. For this reason, early detection of the virus is mandatory in order to limit the spread of the virus. Real-time reverse transcription polymerase chain reaction (RT-PCR) and the antibody test are the main tests used to detect the virus. Chest X-rays (CXRs) and computerized tomography (CT) scans are also used to detect the virus although the American college of Radiology does not recommend using medical imaging as a diagnostic tool. Like other medical imaging, convolutional neural networks are used to classify the images. We believe that developing a model to detect COVID-19 has no clinical value regardless of the accuracy achieved since 58% of CXRs seem to be normal. During literature review, several papers with suspicious accuracy of 90% and higher were found. We believe that the dataset used to train and validate the network is biased and is not appropriate for deep learning as any model we train using the same dataset has achieved high accuracy. Our experiments on Cohen's Covid dataset, augmented with Wang dataset, shows that any model trained on Cohen dataset can easily achieve high accuracy. This was further validated with two experienced radiologists who participated in this study were only able to classify 60% as being Covid. Our study highlight the importance of addressing bias in data and developing trustworthy and explainable ML models based on well curated data.","PeriodicalId":181607,"journal":{"name":"2021 IEEE 3rd International Multidisciplinary Conference on Engineering Technology (IMCET)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"On Data Bias and the Usability of Deep Learning Algorithms in Classifying COVID-19 based on Chest X-ray\",\"authors\":\"Hassan Ezzeddine, M. Awad, Alain S. Abi Ghanem, Bassem Mourani\",\"doi\":\"10.1109/imcet53404.2021.9665574\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"SARS-COV-2 is a new strain of virus that was first detected in China. It quickly spread across the world affecting millions of people. For this reason, early detection of the virus is mandatory in order to limit the spread of the virus. Real-time reverse transcription polymerase chain reaction (RT-PCR) and the antibody test are the main tests used to detect the virus. Chest X-rays (CXRs) and computerized tomography (CT) scans are also used to detect the virus although the American college of Radiology does not recommend using medical imaging as a diagnostic tool. Like other medical imaging, convolutional neural networks are used to classify the images. We believe that developing a model to detect COVID-19 has no clinical value regardless of the accuracy achieved since 58% of CXRs seem to be normal. During literature review, several papers with suspicious accuracy of 90% and higher were found. We believe that the dataset used to train and validate the network is biased and is not appropriate for deep learning as any model we train using the same dataset has achieved high accuracy. Our experiments on Cohen's Covid dataset, augmented with Wang dataset, shows that any model trained on Cohen dataset can easily achieve high accuracy. This was further validated with two experienced radiologists who participated in this study were only able to classify 60% as being Covid. Our study highlight the importance of addressing bias in data and developing trustworthy and explainable ML models based on well curated data.\",\"PeriodicalId\":181607,\"journal\":{\"name\":\"2021 IEEE 3rd International Multidisciplinary Conference on Engineering Technology (IMCET)\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 3rd International Multidisciplinary Conference on Engineering Technology (IMCET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/imcet53404.2021.9665574\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 3rd International Multidisciplinary Conference on Engineering Technology (IMCET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/imcet53404.2021.9665574","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

SARS-COV-2是中国首次发现的一种新型病毒。它迅速蔓延到世界各地,影响了数百万人。因此,为了限制病毒的传播,必须及早发现病毒。实时逆转录聚合酶链反应(RT-PCR)和抗体检测是检测病毒的主要方法。尽管美国放射学会不建议使用医学成像作为诊断工具,但胸部x光片(CXRs)和计算机断层扫描(CT)也可用于检测病毒。与其他医学成像一样,卷积神经网络用于对图像进行分类。我们认为,开发一种检测COVID-19的模型没有任何临床价值,因为58%的cxr似乎是正常的。在文献综述中,发现了几篇准确率在90%以上的可疑论文。我们认为,用于训练和验证网络的数据集是有偏差的,不适合深度学习,因为我们使用相同的数据集训练的任何模型都达到了很高的准确性。我们在Cohen的Covid数据集上的实验,与Wang数据集的增强,表明在Cohen数据集上训练的任何模型都可以很容易地实现高精度。参与本研究的两名经验丰富的放射科医生进一步验证了这一点,他们只能将60%的患者归类为Covid。我们的研究强调了解决数据偏差和基于精心策划的数据开发值得信赖和可解释的ML模型的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
On Data Bias and the Usability of Deep Learning Algorithms in Classifying COVID-19 based on Chest X-ray
SARS-COV-2 is a new strain of virus that was first detected in China. It quickly spread across the world affecting millions of people. For this reason, early detection of the virus is mandatory in order to limit the spread of the virus. Real-time reverse transcription polymerase chain reaction (RT-PCR) and the antibody test are the main tests used to detect the virus. Chest X-rays (CXRs) and computerized tomography (CT) scans are also used to detect the virus although the American college of Radiology does not recommend using medical imaging as a diagnostic tool. Like other medical imaging, convolutional neural networks are used to classify the images. We believe that developing a model to detect COVID-19 has no clinical value regardless of the accuracy achieved since 58% of CXRs seem to be normal. During literature review, several papers with suspicious accuracy of 90% and higher were found. We believe that the dataset used to train and validate the network is biased and is not appropriate for deep learning as any model we train using the same dataset has achieved high accuracy. Our experiments on Cohen's Covid dataset, augmented with Wang dataset, shows that any model trained on Cohen dataset can easily achieve high accuracy. This was further validated with two experienced radiologists who participated in this study were only able to classify 60% as being Covid. Our study highlight the importance of addressing bias in data and developing trustworthy and explainable ML models based on well curated data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信