Comparison of diagnostic accuracy of the artificial intelligence system with human readers in the diagnosis of portable chest x-rays during the COVID-19 pandemic
L. David, W. Elshami, Aisha Alshuweihi, Abdulmunhem Obaideen, B. Issa, S. Shetty
{"title":"Comparison of diagnostic accuracy of the artificial intelligence system with human readers in the diagnosis of portable chest x-rays during the COVID-19 pandemic","authors":"L. David, W. Elshami, Aisha Alshuweihi, Abdulmunhem Obaideen, B. Issa, S. Shetty","doi":"10.4103/abhs.abhs_29_22","DOIUrl":null,"url":null,"abstract":"Background: Evaluating the performance of the available machine learning software is fundamental to ensure trustworthiness and improve automated diagnosis. This study compared the diagnostic accuracy of artificial intelligence (AI) system reporting with human readers for portable chest anteroposterior (AP) x-rays acquired patients in a semi-recumbent position. Methods: Ninety-four patients who underwent portable chest AP with clinical suspicion or confirmed COVID-19 were included in the study; among them, 65 were COVID-19 positive and 29 had symptoms. High-resolution computed tomography (HRCT) Chest was available for 39 patients. Images were read by two radiologists (R1, R2) and AI. In case of disagreement between R1 and R2, a third radiologist (R3) read the images; however, if HRCT Chest was available, we counted HRCT Chest instead of R3. Thus, the gold standard was HRCT or R1 = R2, R1 = R3, or R2 = R3. Results: The sensitivity of the AI system in detecting pleural effusion and consolidation was 100% and 91.3%, respectively. The specificity of the AI system in detecting pleural effusion and lung consolidation was 84% and 61%, respectively. Nevertheless, there is no good agreement between the gold standard and AI in the case of other chest pathologies. Conclusion: Significant moderate agreement with AI and gold standard was shown for pleural effusion and consolidation. There was no significant agreement between the gold standard and AI in the case of the widened mediastinum, collapse, and other pathologies. However, future studies with large sample sizes, multicentric with multiple clinical indications, and radiographic views are recommended.","PeriodicalId":158834,"journal":{"name":"Advances in Biomedical and Health Sciences","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Biomedical and Health Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4103/abhs.abhs_29_22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Evaluating the performance of the available machine learning software is fundamental to ensure trustworthiness and improve automated diagnosis. This study compared the diagnostic accuracy of artificial intelligence (AI) system reporting with human readers for portable chest anteroposterior (AP) x-rays acquired patients in a semi-recumbent position. Methods: Ninety-four patients who underwent portable chest AP with clinical suspicion or confirmed COVID-19 were included in the study; among them, 65 were COVID-19 positive and 29 had symptoms. High-resolution computed tomography (HRCT) Chest was available for 39 patients. Images were read by two radiologists (R1, R2) and AI. In case of disagreement between R1 and R2, a third radiologist (R3) read the images; however, if HRCT Chest was available, we counted HRCT Chest instead of R3. Thus, the gold standard was HRCT or R1 = R2, R1 = R3, or R2 = R3. Results: The sensitivity of the AI system in detecting pleural effusion and consolidation was 100% and 91.3%, respectively. The specificity of the AI system in detecting pleural effusion and lung consolidation was 84% and 61%, respectively. Nevertheless, there is no good agreement between the gold standard and AI in the case of other chest pathologies. Conclusion: Significant moderate agreement with AI and gold standard was shown for pleural effusion and consolidation. There was no significant agreement between the gold standard and AI in the case of the widened mediastinum, collapse, and other pathologies. However, future studies with large sample sizes, multicentric with multiple clinical indications, and radiographic views are recommended.