Purpose: To externally test four chest radiograph classifiers on a large, diverse, real-world dataset with robust subgroup analysis.
Materials and methods: In this retrospective study, adult posteroanterior chest radiographs (January 2016-December 2020) and associated radiology reports from Trillium Health Partners in Ontario, Canada, were extracted and de-identified. An open-source natural language processing tool was locally validated and used to generate ground truth labels for the 197 540-image dataset based on the associated radiology report. Four classifiers generated predictions on each chest radiograph. Performance was evaluated using accuracy, positive predictive value, negative predictive value, sensitivity, specificity, F1 score, and Matthews correlation coefficient for the overall dataset and for patient, setting, and pathology subgroups.
Results: Classifiers demonstrated 68%-77% accuracy, 64%-75% sensitivity, and 82%-94% specificity on the external testing dataset. Algorithms showed decreased sensitivity for solitary findings (43%-65%), patients younger than 40 years (27%-39%), and patients in the emergency department (38%-60%) and decreased specificity on normal chest radiographs with support devices (59%-85%). Differences in sex and ancestry represented movements along an algorithm's receiver operating characteristic curve.
期刊介绍:
Radiology: Artificial Intelligence is a bi-monthly publication that focuses on the emerging applications of machine learning and artificial intelligence in the field of imaging across various disciplines. This journal is available online and accepts multiple manuscript types, including Original Research, Technical Developments, Data Resources, Review articles, Editorials, Letters to the Editor and Replies, Special Reports, and AI in Brief.