Ayman Nada, Alaa A Sayed, Mourad Hamouda, Mohamed Tantawi, Amna Khan, Addison Alt, Heidi Hassanein, Burak C Sevim, Talissa Altes, Ayman Gaballah
{"title":"External validation and performance analysis of a deep learning-based model for the detection of intracranial hemorrhage.","authors":"Ayman Nada, Alaa A Sayed, Mourad Hamouda, Mohamed Tantawi, Amna Khan, Addison Alt, Heidi Hassanein, Burak C Sevim, Talissa Altes, Ayman Gaballah","doi":"10.1177/19714009241303078","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>We aimed to investigate the external validation and performance of an FDA-approved deep learning model in labeling intracranial hemorrhage (ICH) cases on a real-world heterogeneous clinical dataset. Furthermore, we delved deeper into evaluating how patients' risk factors influenced the model's performance and gathered feedback on satisfaction from radiologists of varying ranks.</p><p><strong>Methods: </strong>This prospective IRB approved study included 5600 non-contrast CT scans of the head in various clinical settings, that is, emergency, inpatient, and outpatient units. The patients' risk factors were collected and tested for impacting the performance of DL model utilizing univariate and multivariate regression analyses. The performance of DL model was contrasted to the radiologists' interpretation to determine the presence or absence of ICH with subsequent classification into subcategories of ICH. Key metrics, including accuracy, sensitivity, specificity, positive predictive value, and negative predictive value, were calculated. Receiver operating characteristics curve, along with the area under the curve, were determined. Additionally, a questionnaire was conducted with radiologists of varying ranks to assess their experience with the model.</p><p><strong>Results: </strong>The model exhibited outstanding performance, achieving a high sensitivity of 89% and specificity of 96%. Additional performance metrics, including positive predictive value (82%), negative predictive value (97%), and overall accuracy (94%), underscore its robust capabilities. The area under the ROC curve further demonstrated the model's efficacy, reaching 0.954. Multivariate logistic regression revealed statistical significance for age, sex, history of trauma, operative intervention, HTN, and smoking.</p><p><strong>Conclusion: </strong>Our study highlights the satisfactory performance of the DL model on a diverse real-world dataset, garnering positive feedback from radiology trainees.</p>","PeriodicalId":47358,"journal":{"name":"Neuroradiology Journal","volume":" ","pages":"19714009241303078"},"PeriodicalIF":1.3000,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11603421/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neuroradiology Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/19714009241303078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"NEUROIMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: We aimed to investigate the external validation and performance of an FDA-approved deep learning model in labeling intracranial hemorrhage (ICH) cases on a real-world heterogeneous clinical dataset. Furthermore, we delved deeper into evaluating how patients' risk factors influenced the model's performance and gathered feedback on satisfaction from radiologists of varying ranks.
Methods: This prospective IRB approved study included 5600 non-contrast CT scans of the head in various clinical settings, that is, emergency, inpatient, and outpatient units. The patients' risk factors were collected and tested for impacting the performance of DL model utilizing univariate and multivariate regression analyses. The performance of DL model was contrasted to the radiologists' interpretation to determine the presence or absence of ICH with subsequent classification into subcategories of ICH. Key metrics, including accuracy, sensitivity, specificity, positive predictive value, and negative predictive value, were calculated. Receiver operating characteristics curve, along with the area under the curve, were determined. Additionally, a questionnaire was conducted with radiologists of varying ranks to assess their experience with the model.
Results: The model exhibited outstanding performance, achieving a high sensitivity of 89% and specificity of 96%. Additional performance metrics, including positive predictive value (82%), negative predictive value (97%), and overall accuracy (94%), underscore its robust capabilities. The area under the ROC curve further demonstrated the model's efficacy, reaching 0.954. Multivariate logistic regression revealed statistical significance for age, sex, history of trauma, operative intervention, HTN, and smoking.
Conclusion: Our study highlights the satisfactory performance of the DL model on a diverse real-world dataset, garnering positive feedback from radiology trainees.
期刊介绍:
NRJ - The Neuroradiology Journal (formerly Rivista di Neuroradiologia) is the official journal of the Italian Association of Neuroradiology and of the several Scientific Societies from all over the world. Founded in 1988 as Rivista di Neuroradiologia, of June 2006 evolved in NRJ - The Neuroradiology Journal. It is published bimonthly.