Purpose: To evaluate the diagnostic performance of a deep learning (DL) model for breast US across four hospitals and assess its value to readers with different levels of experience.
Materials and methods: In this retrospective study, a dual attention-based convolutional neural network was built and validated to discriminate malignant tumors from benign tumors by using B-mode and color Doppler US images (n = 45 909, March 2011-August 2018), acquired with 42 types of US machines, of 9895 pathologic analysis-confirmed breast lesions in 8797 patients (27 men and 8770 women; mean age, 47 years ± 12 [SD]). With and without assistance from the DL model, three novice readers with less than 5 years of US experience and two experienced readers with 8 and 18 years of US experience, respectively, interpreted 1024 randomly selected lesions. Differences in the areas under the receiver operating characteristic curves (AUCs) were tested using the DeLong test.
Results: The DL model using both B-mode and color Doppler US images demonstrated expert-level performance at the lesion level, with an AUC of 0.94 (95% CI: 0.92, 0.95) for the internal set. In external datasets, the AUCs were 0.92 (95% CI: 0.90, 0.94) for hospital 1, 0.91 (95% CI: 0.89, 0.94) for hospital 2, and 0.96 (95% CI: 0.94, 0.98) for hospital 3. DL assistance led to improved AUCs (P < .001) for one experienced and three novice radiologists and improved interobserver agreement. The average false-positive rate was reduced by 7.6% (P = .08).
期刊介绍:
Radiology: Artificial Intelligence is a bi-monthly publication that focuses on the emerging applications of machine learning and artificial intelligence in the field of imaging across various disciplines. This journal is available online and accepts multiple manuscript types, including Original Research, Technical Developments, Data Resources, Review articles, Editorials, Letters to the Editor and Replies, Special Reports, and AI in Brief.