Comparison of diagnostic accuracy of the artificial intelligence system with human readers in the diagnosis of portable chest x-rays during the COVID-19 pandemic

L. David, W. Elshami, Aisha Alshuweihi, Abdulmunhem Obaideen, B. Issa, S. Shetty
{"title":"Comparison of diagnostic accuracy of the artificial intelligence system with human readers in the diagnosis of portable chest x-rays during the COVID-19 pandemic","authors":"L. David, W. Elshami, Aisha Alshuweihi, Abdulmunhem Obaideen, B. Issa, S. Shetty","doi":"10.4103/abhs.abhs_29_22","DOIUrl":null,"url":null,"abstract":"Background: Evaluating the performance of the available machine learning software is fundamental to ensure trustworthiness and improve automated diagnosis. This study compared the diagnostic accuracy of artificial intelligence (AI) system reporting with human readers for portable chest anteroposterior (AP) x-rays acquired patients in a semi-recumbent position. Methods: Ninety-four patients who underwent portable chest AP with clinical suspicion or confirmed COVID-19 were included in the study; among them, 65 were COVID-19 positive and 29 had symptoms. High-resolution computed tomography (HRCT) Chest was available for 39 patients. Images were read by two radiologists (R1, R2) and AI. In case of disagreement between R1 and R2, a third radiologist (R3) read the images; however, if HRCT Chest was available, we counted HRCT Chest instead of R3. Thus, the gold standard was HRCT or R1 = R2, R1 = R3, or R2 = R3. Results: The sensitivity of the AI system in detecting pleural effusion and consolidation was 100% and 91.3%, respectively. The specificity of the AI system in detecting pleural effusion and lung consolidation was 84% and 61%, respectively. Nevertheless, there is no good agreement between the gold standard and AI in the case of other chest pathologies. Conclusion: Significant moderate agreement with AI and gold standard was shown for pleural effusion and consolidation. There was no significant agreement between the gold standard and AI in the case of the widened mediastinum, collapse, and other pathologies. However, future studies with large sample sizes, multicentric with multiple clinical indications, and radiographic views are recommended.","PeriodicalId":158834,"journal":{"name":"Advances in Biomedical and Health Sciences","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Biomedical and Health Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4103/abhs.abhs_29_22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Evaluating the performance of the available machine learning software is fundamental to ensure trustworthiness and improve automated diagnosis. This study compared the diagnostic accuracy of artificial intelligence (AI) system reporting with human readers for portable chest anteroposterior (AP) x-rays acquired patients in a semi-recumbent position. Methods: Ninety-four patients who underwent portable chest AP with clinical suspicion or confirmed COVID-19 were included in the study; among them, 65 were COVID-19 positive and 29 had symptoms. High-resolution computed tomography (HRCT) Chest was available for 39 patients. Images were read by two radiologists (R1, R2) and AI. In case of disagreement between R1 and R2, a third radiologist (R3) read the images; however, if HRCT Chest was available, we counted HRCT Chest instead of R3. Thus, the gold standard was HRCT or R1 = R2, R1 = R3, or R2 = R3. Results: The sensitivity of the AI system in detecting pleural effusion and consolidation was 100% and 91.3%, respectively. The specificity of the AI system in detecting pleural effusion and lung consolidation was 84% and 61%, respectively. Nevertheless, there is no good agreement between the gold standard and AI in the case of other chest pathologies. Conclusion: Significant moderate agreement with AI and gold standard was shown for pleural effusion and consolidation. There was no significant agreement between the gold standard and AI in the case of the widened mediastinum, collapse, and other pathologies. However, future studies with large sample sizes, multicentric with multiple clinical indications, and radiographic views are recommended.
COVID-19大流行期间人工智能系统与人类阅读器诊断便携式胸部x线诊断准确性的比较
背景:评估现有机器学习软件的性能是确保可靠性和提高自动化诊断的基础。本研究比较了人工智能(AI)系统报告与人类阅读器对半卧位患者的便携式胸部正位(AP) x射线的诊断准确性。方法:纳入94例临床怀疑或确诊COVID-19的便携式胸部AP患者;其中,新冠病毒阳性65例,出现症状29例。39例患者行胸部高分辨率计算机断层扫描(HRCT)。图像由两名放射科医生(R1, R2)和AI读取。如果R1和R2不一致,第三个放射科医生(R3)读取图像;然而,如果HRCT胸部可用,我们计算HRCT胸部而不是R3。因此,金标准是HRCT或R1 = R2, R1 = R3或R2 = R3。结果:人工智能系统对胸腔积液和实变的检测灵敏度分别为100%和91.3%。AI系统检测胸腔积液和肺实变的特异性分别为84%和61%。然而,在其他胸部病变的情况下,黄金标准和人工智能之间并没有很好的一致性。结论:胸膜积液和实变与人工智能和金标准具有显著的中度一致性。在纵隔增宽、塌陷和其他病理情况下,金标准和人工智能之间没有明显的一致性。然而,建议将来进行大样本量、多中心、多临床适应症和影像学检查的研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信